Iranian Journal of Information Processing and Management

Iranian Journal of Information Processing and Management

Proposing a Method Based on Analysis of Bibliographic References to Identify Related Scientific Articles in Digital Libraries

Document Type : Original Article

Author
Regional Information Center for Science and Technology; Shiraz, Iran.
Abstract
The volume of scientific documents and articles has increased dramatically in the last decade. It makes too difficult to find the relevant documents based on the user’s query. Information retrieval systems help researchers to find relevant scientific articles. One of the capabilities that help researchers to find the relevant papers is the feature of finding related scientific articles to an article. In other words, this feature allows the researcher to view other related articles by selecting one article.
The purpose of this research is to present a method based on the analysis of bibliographic references to identify related scientific articles in digital libraries. The statistical population of this research is the articles published in the last 5 years in Persian and English publications indexed in the ISC in the field of computer science. The proposed method is able to find the articles that are most similar to the given article by analyzing the references of the articles and ranking them based on their similarity. To do that, after extracting the title and obtaining the similarity between the existing references among the articles, those articles that have the most similarity are identified and sorted based on their similarity. The proposed method has been compared with other methods, and the obtained results on both Persian and English data are promising.
Keywords
Subjects

اسلامی‌نسب، معصومه، و رضا جاویدانو 1394. ارائه روشی بر اساس شباهت کسینوسی و شبکه واژگان جهت پیدا کردن میزان شباهت معنایی بین متون. هفتمین کنفرانس بین‌المللی اطلاعات و دانش، دانشگاه ارومیه.
حاجی غلامرضا، مینا، محمدرضا محمدزاده، سید محمدرضا محمدی، و محمدعلی کیوان‌راد. 1401. شباهت معنایی جملات فارسی با استفاده از تطبیق فضای برداری و یادگیری عمیق مقاله. پدافند الکترونیکی و سایبری 2: 43-56.
حسینی بهشتی، ملوک‌السادات، و تقی رجبی. 1400. پیشنهاد طرح تدوین فرااصطلاح‌نامۀ ایرانداک با تکیه بر الگو و ساختار نظام زبان واحد پزشکی (یو ام ال اس). پردازش و مدیریت اطلاعات 105: 229-253.
سلیمانی‌نژاد، عادل، مژده سلاجقه، و الهام طبیبی‌نیا. 1397. خوشه‌بندی مقالات علمی بر پایه الگوریتم k-means، مطالعه موردی: پایگاه پژوهشگاه علوم و فناوری. پژوهشنامه پردازش و مدیریت اطلاعات 34 (2): 871-896.
عباسی، شیرین، و بابک وزیری. 1394. الگوریتم‌های خوشه‌بندی در داده‌های عظیم، کنفرانس بین‌المللی پژوهش‌های کاربردی در فناوری اطلاعات. کامپیوتر و مخابرات. دانشگاه آزاد اسلامی واحد تربت حیدریه.
عسگریان، احسان، جعفر حبیبی، شهروز معاون، و حسین معین‌زاده. 1386. روشی جدید برای خوشه‌بندی مستندات متنی ‌بر اساس آنتولوژی. سومین کنفرانس فناوری اطلاعات و دانش. دانشگاه فردوسی مشهد.
فتحیان دستگردی، اکرم. 1400. انتشار معنایی: بازنمون معنایی انتشارات علمی مبتنی ‌بر مجموعه هستی‌نگاری‌های اسپار. مطالعات ملی کتابداری و سازماندهی اطلاعات. 32 (3): 23-55.
_____، سید مهدی طاهری، اعظم صنعت‌جو، و محسن کاهانی. 1399. پیاده‌سازی روش داده‌های پیوندی در نظام کتابخانه‌ای: بررسی مؤلفه‌های مورد نیاز و ارائه یک الگو. بازیابی دانش و نظام‌های معنایی 25: 67-95.
References:
Atoum, I. 2019. Scaled Pearson’s correlation coefficient for evaluating text similarity measures. Infinite Study.?: Modern Applied Science.
Buscaldi, D., R. Tournier, N. Aussenac-Gilles, & J. Mothe. 2012. Irit: Textual similarity combining conceptual similarity with an n-gram comparison method. In * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) (pp. 552-556). Montreal, Canada.
Church, K. W. 2017. Word2Vec. Natural Language Engineering 23 (1): 155-162.
Cilibrasi, R. L., & P. M. Vitanyi. 2007. The google similarity distance. IEEE Transactions on knowledge and data engineering 19 (3): 370-383.
Devlin, J., M. W. Chang, K. Lee, & K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Dice, L. R. 1945. Measures of the amount of ecologic association between species. Ecology 26 (3): 297-302.
Farouk, M. 2019. Measuring sentences similarity: a survey. arXiv preprint arXiv:1910.03940.
Feldman, R., & J. Sanger. 2007. The text mining handbook: advanced approaches in analyzing unstructured data. Cambridge: Cambridge university press.
Gomaa, W. H., & A. A. Fahmy. 2013. A survey of text similarity approaches. International Journal of Computer Applications 68 (13): 13-18.
Hall, P. A., & G. R. Dowling. 1980. Approximate string matching. ACM computing surveys (CSUR) 12 (4): 381-402.
Hotho, A., A. Nürnberger, & G. Paaß. 2005. A brief survey of text mining. Journal for Language Technology and Computational Linguistics, 20 (1): 19-62.
Islam, A., & D. Inkpen. 2006. Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words. Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy (pp. 1033-1038).
_____. 2008. Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data (TKDD) 2 (2): 1-25.
Kenter, T., & M. De Rijke. 2015. Short text similarity with word embeddings. In Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1411-1420). Melbourne, Australia.
Kolb, P. 2009. Experiments on the difference between semantic similarity and relatedness. In Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009) (pp. 81-88). Odense, Denmark.
Kumar, D., A. Kumar, M. Singh, A. Patel, & S. Jain. 2018. Modern WordNet: An Affective Extension of WordNet. In International Conference On Computational Vision and Bio Inspired Computing (pp. 527-536). Springer, Cham.
Landauer, T. K., & S. T. Dumais. 1997. A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review 104 (2): 211.
Lakshmi, R., & S. Baskar. 2021. Efficient text document clustering with new similarity measures. International Journal of Business Intelligence and Data Mining 18 (1): 49-72.
Le, Q., & T. Mikolov. 2014. Distributed representations of sentences and documents. In International conference on machine learning (pp. 1188-1196). PMLR. Beijing, China.
Li, Y., D. McLean, Z. A. Bandar, J. D. O'shea, & K. Crockett. 2006. Sentence similarity based on semantic nets and corpus statistics. IEEE transactions on knowledge and data engineering 18 (8): 1138-1150.
Lin, D. 1998. Extracting collocations from text corpora. In First workshop on computational terminology (pp. 57-63). Montreal, Cananda.
Little, C., D. Mclean, K. Crockett, & B. Edmonds. 2020. A semantic and syntactic similarity measure for political tweets. IEEE Access, 8: 154095-154113.
Liu, M., B. Lang, & Z. Gu. 2017a. Calculating semantic similarity between academic articles using topic event and ontology. arXiv preprint arXiv:1711.11508.
Liu, M., B. Lang, Z. Gu, & A. Zeeshan. 2017b. Measuring similarity of academic articles with semantic profile and joint word embedding. Tsinghua Science and Technology 22 (6): 619-632.
Mihalcea, R., C. Corley, & C. Strapparava. 2006. Corpus-based and knowledge-based measures of text semantic similarity. Proceedings of the 21st national conference on Artificial intelligence, 1 (6): 775-780.
Niwattanakul, S., J. Singthongchai, E. Naenudorn, & S. Wanapu. 2013. Using of Jaccard Coefficient for keywords similarity. In Proceedings of the International Multiconference of Engineers and Computer Scientists 1 (6): 380-384).
Patwardhan, S., S. Banerjee, & T. Pedersen. 2003. Using measures of semantic relatedness for word sense disambiguation. In International conference on intelligent text processing and computational linguistics (pp. 241-257). Berlin, Heidelberg: Springer.
Peterson, J. L. 1980. Computer programs for detecting and correcting spelling errors. Communications of the ACM 23 (12): 676-687.
Qurashi, A. W., V. Holmes, & A. P. Johnson. 2020. Document Processing: Methods for Semantic Text Similarity Analysis. In 2020 International Conference on INnovations in Intelligent SysTems and Applications (INISTA) (pp. 1-6). IEEE. Novi Sad, Serbia.
Reynolds, B. E. 1980. Taxicab geometry. Pi Mu Epsilon Journal 7 (2): 77-88.
Sadeghi, M., & J. Vegas, J. 2014. Automatic identification of light stop words for Persian information retrieval systems. Journal of Information Science 40 (4): 476-487.
Singh, R., & S. Singh. 2021. Text Similarity Measures in News Articles by Vector Space Model Using NLP. Journal of the Institution of Engineers (India): Series B 102 (2): 329-338.
Turney, P. D. 2001. Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In European conference on machine learning (pp. 491-502). Berlin, Heidelberg: Springer.
Wang, N., P. Wang, & B. Zhang. 2010. An improved TF-IDF weights function based on information theory. In 2010 International Conference on Computer and Communication Technologies in Agriculture Engineering Chengdu, China. (Vol. 3, pp. 439-441).

  • Receive Date 29 January 2023
  • Revise Date 30 May 2023
  • Accept Date 31 May 2023