A new Persian Text Summarization Approach based on Natural Language Processing and Graph Similarity

Hosseinikhah, Tayyebeh; Ahmadi, Abbas; Mohebi, Azadeh

doi:10.35050/JIPM010.2018.084

A new Persian Text Summarization Approach based on Natural Language Processing and Graph Similarity

Authors

Tayyebeh Hosseinikhah

Abbas Ahmadi

Azadeh Mohebi

10.35050/JIPM010.2018.084

Abstract

A significant amount of available information is stored in textual databases which contains a large collection of documents from different sources (such as news, articles, books, emails and web pages). The increasing visibility and importance of this class of information motivates us to work on having better automatic evaluation tools for textual resources.

The automatic summarization of text is one of the ways to prevent the waste of users’ time. The extractive text summarization consists of the extraction of the more important sentences with the purpose of shortening input text while maintaining the topics covered and the subjects discussed.

In this paper, we have tried to improve the accuracy of the extracted summaries by combining natural language processing and text mining techniques. By modifying the mentioned algorithms and sentence scoring measures, accuracy is increased as compared to the previously used techniques.

Part of speech tagging is used for calculating coefficient of words’ importance. Using this approach will in turn help us with to pick the more meaningful words and phrases that will result in better accuracy of the system.

Graph similarity‘s methods are used to select sentences. Changing weight of the selected sentences in each step leads to solve the redundancy problem.

Standard evaluation measures such as “Precision” and “Recall” are used to evaluate results based on a Persian corpus.

Keywords

Extractive Summarization

Natural Language Processing

text mining

Part of Speech Tagging

Similarity Graph

Al-Hashemi, R. 2010. Text Summarization Extraction System (TSES) Using Extracted Keywords. Int. Arab J. e-Technol 1 (4): 164–168. Barzilay, R., and M. Elhadad. 1999. Using lexical chains for text summarization In Advances in automatic text summarization. London: MIT press. Dalianis, H. 2000. SweSum-A Text Summarizer for Swedish, Available at: https://people.dsv.su.se/~hercules/papers/Textsumsummary.html. (accessed May 17,2018) Diola, A. M., J. F. T. O. Diola, Lopez, P. F. Torralba, D. So, and A. Borra. 2004. Automatic Text Summarization. In Proceedings of the 2nd National Natural Language Processing Research Symposium. Dunning, T. 1993. Accurate methods for the statistics of surprise and coincidence. Computational linguistics 19 (1): 61–74. Erkan G. and D. Radev. 2004. LexRank: Graph-based Lexical Centrality As Salience in Text Summarization. Journal of Artificial Intelligence 22 (1): 457–479. Güngör, T. 2010. Part-of-Speech Tagging. In N. Indurkhya & F. Damerau, eds. Handbook of natural language processing. New York: CRC Press, pp.: 205–235. Guo-shun, W. 2011. Dynamic pages sequencing strategy. In Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on. Xi’an, China: IEEE. pp.: 591–593. Hovy, E. 2003. Text Summarization. In R. Mitkov, ed. The Oxford Handbook of Computational Linguistics. Oxford University Press, pp.: 583–598. Jivani, A. G. 2011. A comparative study of stemming algorithms. Int. J. Comp. Tech. Appl 2 (6): 1930–1938. Luhn, H. P. 1957. A Statistical Approach to Mechanized Encoding and Searching of Literary Information. IBM Journal of Research and Development 1 (4): 309–317. Mani, I., and M. T. Maybury. 1999. Advances in automatic text summarization. London: MIT press. Miller, G. A, Beckwith, R., Fellbaum, C., Gross, D., Miller, K. 1990. WordNet: An on-line lexical database. International Journal of Lexicography 3: 235–244. Moghaddas, B. B., M. Kahani, S. A. Toosi, A. Pourmasoumi, and A. Estiri. 2013. Pasokh: A standard corpus for the evaluation of Persian text summarizers. In ICCKE 2013. Mashhad: IEEE, pp.: 471–475. Nadkarni, P. M., L. Ohno-Machado, W. W. Chapman. 2011. Natural language processing: an introduction. Journal of the American Medical Informatics Association 18 (5): 544–551. Available at: http://jamia.oxfordjournals.org/content/18/5/544. (accessed May 17,2018) Nenkova, A., and K. McKeown. 2012. A Survey of Text Summarization Techniques. In C. C. Aggarwal & C. Zhai, eds. Mining Text Data. Boston, MA: Springer US, pp.: 43–76. Radev, D., J. Blitzer, A. Winkel, T. Allison, and M. Topper. 2006. MEAD Documentation v3.10. (technical report) : University of Michigan, pp.1–64. Radev, D. R., T. Allison, S. Blair-Goldensohn, J. Blitzer, A. Celebi, S. Dimitrov, E. Drabek, A. Hakim, W. Lam, D. Liu, J. Otterbacher, H. Qi, H. Saggion, S. Teufel, M. Topper, A. Winkel, and Z. Zhang. 2004. MEAD-A Platform for Multidocument Multilingual Text Summarization. In Conference of Language Resources and Evaluation (LREC), Lisbon: European Language Resources Association (ELRA). Shakeri, H., S. Gholamrezazadeh, M. Amini Salehi, and F. Ghadamyari. 2012. A New Graph-Based Algorithm for Persian Text Summarization. In J. J. (Jong Hyuk) Park et al., eds. Computer Science and Convergence: CSA 2011 & WCC 2011 Proceedings. Dordrecht: Springer Netherlands, pp.: 21-30. Song, Y.-I., K.-S. Han, and H. C. Rim. 2004. A term weighting method based on lexical chain for automatic summarization. In International Conference on Intelligent Text Processing and Computational Linguistics. Seoul, Korea. pp.: 636–639. Taghva, K., R. Beckley, and M. Sadeh. 2005. A Stemming Algorithm for the Farsi Language. ITCC 1: 158–162. Tashakori, M., M. Meybodi, and F. Oroumchian. 2002. Bon: The persian stemmer. In EurAsia-ICT 2002: Information and Communication Technolgy. Berlin Heidelberg:Springer. pp.: 487–494. Vanderwende, L., H. Suzuki, C. Brockett, and A. Nenkova.2007. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Information Processing and Management 43 (6): 1606–1618. Zamanifar, A., B. Minaei-Bidgoli, and M. Sharifi. 2008. A new hybrid farsi text summarization technique based on term co-occurrence and conceptual property of the text. Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. SNPD’08. Ninth ACIS International Conference on Phuket, Thailand pp.: 635–639.

Iranian Journal of Information Processing and Management

Volume 33, Issue 2 - Serial Number 92
Winter 2018
Pages 885-914

XML

PDF 10.78 M

Receive Date 10 December 2022
Revise Date 14 February 2023

Article View 1,022
PDF Download 10,267

Iranian Journal of Information Processing and Management

A new Persian Text Summarization Approach based on Natural Language Processing and Graph Similarity

Volume 33, Issue 2 - Serial Number 92Winter 2018Pages 885-914

Files

History

Share

How to cite

Statistics

Volume 33, Issue 2 - Serial Number 92
Winter 2018
Pages 885-914