Volume 36, Issue 3 (Spring 2021)                   ... 2021, 36(3): 791-816 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Mozafari N. A Genetic-based Approach for Author Name Disambiguation Problem. .... 2021; 36 (3) :791-816
URL: http://jipm.irandoc.ac.ir/article-1-4477-en.html
Regional Information Center for Science and Technology; Islamic World Science Citation Center; Shiraz, Iran
Abstract:   (1274 Views)
In the recent years, with the increasing volume of articles and the use of Internet and search engine services, the author name disambiguation problem has received a lot of attention. Name disambiguation can occur when one is seeking a list of publications of an author who has used different name variations and also when there are multiple other authors with the same name. So far, various methods have been proposed to solve this problem, each of which has its own advantages and disadvantages. Despite years of research, the name disambiguation problem remains largely unresolved. In this study, we propose an algorithm to identify several records that belong to one author. For this purpose, a new criterion has been proposed to determine the similarity between the two records. Since this study addresses the approximate matching of authors’ records, the importance of the fields in each record is determined by the coefficients. In order to get the optimal coefficients, we propose a genetic algorithm to learn from the available samples. The proposed method has been evaluated with two fitness functions on experimental data and the results are promising.
Full-Text [PDF 1052 kb]   (399 Downloads)    
Type of Study: Research | Subject: Information Storage and Retrieval
Received: 2020/04/19 | Accepted: 2020/10/7 | Published: 2021/04/5

References
1. رزمی شندی، مسعود، یعقوب نوروزی، و مهدی علیپور حافظی. 1399. ارائه الگوی مفهومی به‌کارگیری اینترنت اشیا در خدمات نوین کتابخانه‌های دیجیتال ایران. پژوهشنامه پردازش و مدیریت اطلاعات ۳۵ (۳): ۶۹۳-۷۲۸.
2. قاسمی الوری، مینا، و مظفر چشمه‌سهرابی. 1399. تحلیل کمی و انتقادی پژوهش‌های حوزه کتابخانه‌های دیجیتالی در ایران، پژوهشنامه پردازش و مدیریت اطلاعات 4 (35): 921-952.
3. مرتضوی، سید محمد، محمدحسین ندیمی شهرکی، و مصطفی موسی‌خانی. 1396. بهبود صحت ابهام‌زدایی نام نویسنده با استفاده از خوشه‌بندی تجمعی. پردازش علائم و داده‌ها 34 (4): 117-127.
4. مزروعی سبدانی، نصیرالدین، حسین ابراهیم‌پور کومله، و علی‌محمد نیک‌فرجام. 1392. ارائه روش با نظارت به‌منظور دسته‌بندی مقالات با وجود ابهام در داده‌ها. دوازدهمین کنفرانس سیستم‌های هوشمند ایران، مجتمع آموزش عالی بم.
5. Bekkerman, R., & A. McCallum. 2005. Disambiguating web appearances of people in a social network. In Proceedings of the 14th international conference on World Wide Web, pp. 463-470. Chiba, Japan. [DOI:10.1145/1060745.1060813]
6. Breiman, L. 2017. Classification and regression trees. Routledge. [DOI:10.1201/9781315139470]
7. Fan, Xiaoming, Jianyong Wang, Xu Pu, Lizhu Zhou, and Bing Lv. 2011. On graph-based name disambiguation. Journal of Data and Information Quality (JDIQ) 2 (2): 1-23. [DOI:10.1145/1891879.1891883]
8. Giles, C. Lee, Hongyuan Zha, and Hui Han. 2005. Name disambiguation in author citations using a k-way spectral clustering method. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries (JCDL'05), pp. 334-343. IEEE. Denver, CO USA.
9. Han, Donghong, Siqi Liu, Yachao Hu, Bin Wang, and Yongjiao Sun. 2015. ELM-based name disambiguation in bibliography. World Wide Web 18 (2): 253-263. [DOI:10.1007/s11280-013-0226-4]
10. Han, Hui, Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis. 2004. Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries. pp. 296-305. IEEE. Tuscon AZ USA. [DOI:10.1145/996350.996419]
11. Hodge, Victoria J., and Jim Austin. 2003. A comparison of standard spell checking algorithms and a novel binary neural approach. IEEE transactions on knowledge and data engineering 15 (5): 1073-1081. [DOI:10.1109/TKDE.2003.1232265]
12. Holmes, David, and M. Catherine McCabe. 2002. Improving precision and recall for soundex retrieval. In Proceedings. International Conference on Information Technology: Coding and Computing, pp. 22-26. IEEE. Las Vegas, Nevada.
13. Huang, Jian, Seyda Ertekin, and C. Lee Giles. 2006. Efficient name disambiguation for large-scale databases. In European conference on principles of data mining and knowledge discovery, pp. 536-544. Berlin, Heidelberg: Springer. [DOI:10.1007/11871637_53]
14. Hussain, Ijaz, and Sohail Asghar. 2017. A survey of author name disambiguation techniques: 2010-2016. The Knowledge Engineering Review 32: e22 [DOI:10.1017/S0269888917000182]
15. Huynh, Tin, Kiem Hoang, Tien Do, and Duc Huynh. 2013. Vietnamese author name disambiguation for integrating publications from heterogeneous sources." In Asian Conference on Intelligent Information and Database Systems, pp. 226-235. Berlin, Heidelberg: Springer. [DOI:10.1007/978-3-642-36546-1_24]
16. Imran, Muhammad, Syed Gillani, and Maurizio Marchese. 2013. A real-time heuristic-based unsupervised method for name disambiguation in digital libraries. D-Lib Magazine 19 (9):1. [DOI:10.1045/september2013-imran]
17. Lait, Andrew J., and Brian Randell. 1996. An assessment of name matching algorithms. Technical Report Series.University of Newcastle upon Tyne Computing Science.
18. Navarro, Gonzalo. 2001. A guided tour to approximate string matching. ACM computing surveys (CSUR) 33 (1): 31-88. [DOI:10.1145/375360.375365]
19. Niwattanakul, S., J. Singthongchai, E. Naenudorn, and S. Wanapu. 2013. Using of Jaccard coefficient for keywords similarity. In Proceedings of the international multi-conference of engineers and computer scientists 1 (6): 380-384.
20. Philips, Lawrence. 2000. The double metaphone search algorithm. C/C++ Users Journal 18 (6): 38-43.
21. Sayers, Adrian. 2014. NYSIIS: Stata module to calculate nysiis codes from string variables. Statistical Software Components S457936, Boston: College Department of Economics. Revised 21 Jul 2018.
22. Seol, Jae-Wook, Seok-Hyoung Lee, and Kwang-Young Kim. 2016. Author disambiguation using co-author network and supervised learning approach in scholarly data. International Journal of Software Engineering and Its Applications 10 (4): 73-82. [DOI:10.14257/ijseia.2016.10.4.08]
23. Shin, Dongwook, Taehwan Kim, Joongmin Choi, and Jungsun Kim. 2014. 2014. Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100 (1): 15-50. [DOI:10.1007/s11192-014-1289-4]
24. Song, Yang, Jian Huang, Isaac G. Councill, Jia Li, and C. Lee Giles. 2007. Efficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, pp. 342-351. Vancouver BC Canada. [DOI:10.1145/1255175.1255243]
25. Tang, Jie, Alvis CM Fong, Bo Wang, and Jing Zhang. 2011. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering 24 (6): 975-987. [DOI:10.1109/TKDE.2011.13]
26. Tejada, Sheila, Craig A. Knoblock, and Steven Minton. 2001. Learning object identification rules for information integration. Information Systems 26 (8): 607-633. [DOI:10.1016/S0306-4379(01)00042-4]
27. Torvik, Vetle I., Marc Weeber, Don R. Swanson, and Neil R. Smalheiser. 2005. A probabilistic similarity metric for Medline records: A model for author name disambiguation. Journal of the American Society for information science and technology 56 (2): 140-158. [DOI:10.1002/asi.20105]
28. Tran, Hung Nghiep, Tin Huynh, and Tien Do. 2014. Author name disambiguation by using deep neural network. In Asian Conference on Intelligent Information and Database Systems, pp. 123-132. Cham: Springer. [DOI:10.1007/978-3-319-05476-6_13]
29. Wang, Xuezhi, Jie Tang, Hong Cheng, and S. Yu Philip. 2011. Adana: Active name disambiguation. In 2011 IEEE 11th international conference on data mining, pp. 794-803. IEEE. Vancouver, British Columbia, Canada. [DOI:10.1109/ICDM.2011.19]
30. Wang, Jian, Kaspars Berzins, Diana Hicks, Julia Melkers, Fang Xiao, and Diogo Pinheiro. 2012. A boosted-trees method for name disambiguation. Scientometrics 93 (2): 391-411. [DOI:10.1007/s11192-012-0681-1]
31. Zobel, Justin, and Philip Dart. 1996. Phonetic string matching: Lessons from information retrieval. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 166-172. Zurich Switzerland. [DOI:10.1145/243199.243258]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2022 CC BY-NC 4.0 | Iranian Journal of Information processing and Management

Designed & Developed by : Yektaweb