توسعه سیستم پیشنهاددهنده بر مبنای استدلال نمونه محور برای نمایه‌سازی مستندات علمی فارسی

محبی, آزاده; فخرزاده, آزاده; زرین بال, مرضیه

doi:10.22034/jipm.2023.704737

توسعه سیستم پیشنهاددهنده بر مبنای استدلال نمونه محور برای نمایه‌سازی مستندات علمی فارسی

نوع مقاله : مقاله پژوهشی

نویسندگان

آزاده محبی ¹

آزاده فخرزاده ²

مرضیه زرین بال ¹

¹ پژوهشگاه علوم و فناوری اطلاعات ایران (ایرانداک)؛ تهران، ایران

² پژوهشگاه علوم و فناوری اطلاعات ایران (ایرانداک) ،تهران ، ایران

10.22034/jipm.2023.704737

چکیده

استخراج کلیدواژه یکی از مهمترین قدم‌های فرآیند نمایه‌سازی مستندات است. کلیدواژه‌ها توصیفگرهای مفهومی هستند که می‌توانند در جستجو و بازیابی اطلاعات و نیز اشاعه آنها بکارگرفته شوند. در پایگاه‌های دربردارنده اسناد علمی مانند پایگاه علمی گنج پژوهشگاه علوم و فناوری اطلاعات ایران، کلیدواژه‌ها نقش مهمتری دارند و تخصیص کلیدواژه‌های تخصصی چالش‌برانگیزتر است چرا که این پایگاه‌ها دربرگیرنده اسناد تخصصی با حوزه‌های علمی مختلفی هستند. فرآیند نمایه-سازی دستی بسیار زمان‌بر است و با توجه به افزایش حجم تولید و ثبت مستندات علمی، نیاز است که این فرایند با سرعت بیشتری صورت گیرد. لذا استفاده از روش‌های ماشینی هوشمند برای پیشنهاد و تخصیص کلیدواژه ضروری است. تحلیل آماری و معنایی اسناد و استفاده از روش‌های یادگیری ماشین از جمله روش‌های پرکاربرد در بسیاری از پایگاه‌های اطلاعات علمی دنیا است. بر همین اساس، در این پژوهش روشی برای پیشنهاد کلیدواژه به مستندات علمی فارسی بر مبنای روش‌های هوشمند پردازش متن و یادگیری ماشین ارائه شده است. این روش بر مبنای سیستم‌های پیشنهاددهنده و استدلال نمونه‌محور است که براساس آن، مجموعه‌ای از کلیدواژه‌های مرتبط با یک سند به نمایه‌ساز پیشنهاد می‌شود تا او سریعتر بتواند کلیدواژه‌های مناسب را انتخاب کند. به بیانی دیگر، ابتدا اسناد مشابه با سند جدید براساس روش‌های TFIDFو روش‌های بازنمایی کلمه-به-بردار، بازیابی شده و سپس کلیدواژه‌های کاندید از بین اسناد مشابه براساس یک تابع رتبه‌بندی انتخاب می‌شوند. روش پیشنهادی بر مجموعه‌ای از اسناد پایگاه گنج در سه حوزه فنی و مهندسی، هنر و ادبیات، و علوم انسانی، پیاده‌سازی و نتایج آن با معیارهایی نظیر دقت، فراخوانی و نظرات متخصصین ارزیابی شده است.

کلیدواژه‌ها

سیستم‌های پیشنهاددهنده

استدلال نمونه محور

روش بازنمایی کلمه-به-بردار

بازیابی اطلاعات

یادگیری ماشین

نمایه‌سازی

موضوعات

سیستم‌های هوشمند و پیشنهاددهنده

عنوان مقاله English

A Case-Based Recommender System for Persian Scientific Document Indexing

نویسندگان English

Azadeh Mohebi ¹

Azadeh Fakhrzdaeh ²

Marzieh Zarinbal ¹

¹ Iranian Research Institute for Information Science and Technology (IranDoc); Tehran, Iran

² Iranian Research Institute for Information Science and Technology (IranDoc); Tehran, Iran

چکیده English

Keyword extraction is a key step in document indexing. Keywords are semantic and content-based descriptors of a document, which can be used in document retrieval and representation. In databases containing scientific documents, such as Ganj in Irannian Research Institue for Information Science and Technology (IranDoc), it is even more critical to assign meaningful keywords for documents, since the documents are from different academic disciplines and contain technical terms.
As the number of scientific documents grows exponentially, having an automatic and intelligent keyword extraction technique is getting more critical. There are various keyword extraction techniques that are either based on statistical features of the text or machine learning approaches, and sometimes a combination of both. In this research, we propose a new keyword extraction method for Persian scientific documents based on recommender systems and case-based reasoning. The proposed method is designed based on case-based reasoning in which the main assumption is that similar documents share similar keywords. There are two main steps in the proposed approach: first, similar documents to a given new document are retrieved based on TFIDF and word2vec model, second, the candidate keywords are extracted from retrieved documents and ranked based on a new scoring scheme, and a set of keyword are selected from the candidate keywords based on their score. The proposed method is tested and avaluated on a set of documents of Ganj database in three different subject areas (Art, Humanities and Engineering), based on precision, recall and expert panel

کلیدواژه‌ها English

Keyword Extraction

Recommender Systems

Case-Based Reasoning

Word2Vec Word Embedding

Information Retrieval

Machin Learning

Indexing

References:

Adomavicius, G., & A. Tuzhilin. 2005. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17 (6): 734–749. https://doi.org/10.1109/TKDE.2005.99

Ahn, J., R. Farzan, & P. Brusilovsky. 2005. Comprehensive personalized information access in an educational digital library. Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ’05), 9–18. https://doi.org/10.1145/1065385.1065388

Alami Merrouni, Z., B. Frikh & B. Ouhbi. 2020. Automatic keyphrase extraction: a survey and trends. Journal of Intelligent Information Systems 54 (2): 391–424. https://doi.org/10.1007/s10844-019-00558-9

Baltrunas, L. 2008. Exploiting contextual information in recommender systems. Proceedings of the 2008 ACM Conference on Recommender Systems, 295–298. Lausanne.

Bayatmakou, F., A. Ahmadi, & A. Mohebi. 2017. Automatic query-based keyword and keyphrase extraction. 2017 Artificial Intelligence and Signal Processing Conference (AISP), 325–330. Shiraz.

Bohra, A., & N. C. Barwar. 2022. A Deep Learning Approach for Plagiarism Detection System Using BERT. In M. Saraswat, H. Sharma, K. Balachandran, J. H. Kim, & J. C. Bansal (Eds.), Congress on Intelligent Systems (pp. 163–174). Springer Nature Singapore.

Bridge, D., M. H. Goker, L. McGinty, & B. Smyth. 2005. Case-based recommender systems. The Knowledge Engineering Review 20 (03): 315. https://doi.org/10.1017/S0269888906000567

Burke, R. 2007. Hybrid Web Recommender Systems. In P. Brusilovsky, A. Kobsa, & W. Nejdl (Eds.), The Adaptive Web: Methods and Strategies of Web Personalization (pp. 377–408). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-72079-9_12

Firoozeh, N., A. Nazarenko, F. Alizon, & B. Daille. 2020. Keyword extraction: Issues and methods. Natural Language Engineering 26 (3): 259–291. https://doi.org/10.1017/S1351324919000457

Frické, M. 2012a. Logic and the Organization of Information. Springer. https://doi.org/https://doi.org/10.1007/978-1-4614-3088-9

_____. 2012b. Logic and the Organization of Information. Springer. https://doi.org/https://doi.org/10.1007/978-1-4614-3088-9

García-Laencina, P. J., J. L. Sancho-Gómez, & A. R. Figueiras-Vidal. 2010. Pattern classification with missing data: a review. Neural Computing and Applications, 19 (2), 263–282. https://doi.org/10.1007/s00521-009-0295-6

Hasan, K. S., & V. Ng. 2014. Automatic Keyphrase Extraction: A Survey of the State of the Art. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1262–1273. https://doi.org/10.3115/v1/P14-1119

Huang, M., A. Névéol, & Z. Lu. 2011. Recommending MeSH terms for annotating biomedical articles. Journal of the American Medical Informatics Association : JAMIA, 18 (5), 660–667. https://doi.org/10.1136/amiajnl-2010-000055

Humphreys, P., R. Mcivor, & F. Chan. 2003. Using case-based reasoning to evaluate supplier environmental management performance. Expert Systems with Applications, 25, 141–153. https://doi.org/10.1016/S0957-4174(03)00042-3

Khozani, S. M. H., & H. Bayat. 2011. Specialization of keyword extraction approach to Persian texts. Proceedings of the 2011 International Conference of Soft Computing and Pattern Recognition, SoCPaR 2011, 112–116. https://doi.org/10.1109/SoCPaR.2011.6089124

Kian, H., & M. Zahedi. 2011. an Efficient Approach for Keyword Selection; Improving Accessibility of Web Contents By General Search Engines. International Journal of Web & …, 2 (4): 81–90.

Klamma, R., P. M. Cuong, & Y. Cao. 2009. You Never Walk Alone: Recommending Academic Events Based on Social Network Analysis. In J. Zhou (Ed.), Complex Sciences (pp. 657–670). Berlin Heidelberg: Springer.

Kolodner, J. 1992. An introduction to case-based Reasoning. Artificial Intelligence Review 6: 3–34.

Manning, C. D., & P. Raghavan. 2009. An Introduction to Information Retrieval. Online, 1, 1. https://doi.org/10.1109/LPT.2009.2020494

Mao, Y., & Z. Lu. 2017. MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank. Journal of Biomedical Semantics, 8 (1): 15. https://doi.org/10.1186/s13326-017-0123-3

Mehrabi, E., A. Mohebi, & A. Ahmadi. 2021. Improved Keyword Extraction for Persian Academic Texts Using RAKE Algorithm; Case Study: Persian Theses and Dissertations. Iranian Journal of Information Processing and Management 37 (1). https://doi.org/10.52547/jipm.37.1.197

Mikolov, T., K. Chen, G. Corrado, & J. Dean. 2013. Efficient Estimation of Word Representations in Vector Space. 1–12. https://doi.org/10.1162/153244303322533223

Oh, S., Z. Lei, W. C. Lee, P. Mitra, & J. Yen. 2013. CV-PCR: a context-guided value-driven framework for patent citation recommendation. Proceedings of the 22nd ACM International Conference on Information & Knowledge Management 2291–2296. New York.

Pay, T., & S. Lucci. 2017. Automatic keyword extraction: An ensemble method. 2017 IEEE International Conference on Big Data (Big Data), 4816–4818. https://doi.org/10.1109/BigData.2017.8258552

Recommender Systems Handbook. 2011. In P. B. Ricci, Francesco;Rokach, Lior;Shapira, Bracha; Kantor (Ed.), Springer. https://doi.org/10.1007/978-0-387-85820-3

Rehurek, R., & P. Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45–50. Valletta.

Rose, S., D. Engel, N. Cramer, & W. Cowley. 2010. Automatic Keyword Extraction from Individual Documents. Text Mining: Applications and Theory, October 2017, 1–20. https://doi.org/10.1002/9780470689646.ch1

Roul, R. K., & K. Arora. 2019. A nifty review to text summarization-based recommendation system for electronic products. Soft Computing, 23(24), 13183–13204. https://doi.org/10.1007/s00500-019-03861-3

Sayyadiharikandeh, M., M. Ghodsi, & M. Naghibi. 2012. PostRank: A New Algorithm for Incremental Finding of Persian Blog Representative Words. Proceedings of the 2Nd International Conference on Web Intelligence, Mining and Semantics, 17:1--17:6. https://doi.org/10.1145/2254129.2254152

Shani, G., & A. Gunawardana. 2011. Evaluating recommendation systems. In Recommender systems handbook (pp. 257–297). Boston: Springer.

Sharma, C., M. Jain, & A. Aggarwal. 2018. Keyword Extraction Using Graph Centrality and WordNet. In S. Chakraverty, A. Goel, & S. Misra (Eds.), Towards Extensible and Adaptable Methods in Computing (pp. 363–372). Springer Singapore. https://doi.org/10.1007/978-981-13-2348-5_27

Smyth, B. 2007. Case-based recommendation. In Brusilovsky, P., Kobsa, A., Nejdl, W. (eds) The Adaptive Web. Lecture Notes in Computer Science 4321. Berlin, Heidelberg: Springer.

Thushara, M. G., T. Mownika, & R. Mangamuru. 2019. A comparative study on different keyword extraction algorithms. Proceedings of the 3rd International Conference on Computing Methodologies and Communication, ICCMC 2019, Iccmc, 969–973. https://doi.org/10.1109/ICCMC.2019.8819630

Wang, F. A. N., N. Shi, & B. E. N. Chen. 2010. A Comprehensive Survey of the Reviewer Assignment Problem. International Journal of Information Technology & Decision Making, 09 (04): 645–668. https://doi.org/10.1142/S0219622010003993

Weber, R. O., K. D. Ashley, & S. Brüninghaus. 2005. Textual case-based reasoning. The Knowledge Engineering Review 20 (3): 255–260.