امکان سنجی توسعه هستی شناسی به روش نیمه خودکار مبتنی بر تحلیل بسامد واژگان: مطالعه موردی بیماری «گلوکوم»

تمجید, سمیه; نوشین فرد, فاطمه; حسینی بهشتی, ملوک السادات; حریری, نجلا; باب الحوائجی, فهیمه

doi:10.22034/jipm.2023.698596

امکان سنجی توسعه هستی شناسی به روش نیمه خودکار مبتنی بر تحلیل بسامد واژگان: مطالعه موردی بیماری «گلوکوم»

نویسندگان

سمیه تمجید ¹

فاطمه نوشین فرد ¹

ملوک السادات حسینی بهشتی ²

نجلا حریری ¹

فهیمه باب الحوائجی ¹

¹ دانشگاه آزاد اسلامی؛ واحد علوم تحقیقات؛ تهران، ایران

² پژوهشگاه علوم و فناوری اطلاعات ایران (ایرانداک)؛ تهران، ایران

10.22034/jipm.2023.698596

چکیده

تغییر رویکرد نظام‌های اطلاعاتی از پردازش واژه به پردازش مفهوم، موجب توجه به هستی‌شناسی‌ها شده است. در علوم پزشکی و بیماری‌های انسان به ‌لحاظ وجود تنوع در اصطلاحات و لزوم اشتراک اطلاعات از طریق نرم‌افزارهای مختلف مانند پرونده‌های پزشکی، سامانه‌های ثبت سوابق بهداشتی و ... ‌به‌کارگیری هستی‌شناسی‌ها ضروری به ‌نظر می‌رسد. در ‌پژوهش حاضر، رویکردی نیمه‌خودکار برای توسعه هستی‌شناسی پیشنهاد شده است که می‌تواند با استفاده از ابزارهای متن‌کاوی، شناسایی مؤلفه‌های ساختاری هستی‌شناسی و تعیین نسبی روابط را از متون علمی تسهیل کند. مدل پیشنهادی در قالب کد نرم‌افزاری با نام اختصاری TmbOnt_Alfa ارائه شده است. این کد با استفاده از رابط کاربر، فایل متنی ورودی را فراخوانی کرده و پس از پردازش بر اساس تنظیمات، اصطلاحات کلیدی برای توسعة هستی‌شناسی را استخراج می‌کند. به‌منظور ارزیابی کارایی روش پیشنهادی، مطالعه موردی در حوزة بیماری «گلوکوم» با داده‌های متنی مشتمل بر10،000 چکیده مقاله از «پابمد» بر مبنای جست‌وجوی واژگانی تهیه گردید. پس از مراحل پردازش، مفاهیم و ساختار سلسله‌مراتبی هستی‌شناسی حاصل در نرم‌افزار «پروتژ» وارد شد. سرانجام، سنجش قیاسی هستی‌شناسی توسعه‌یافته با سرعنوان‌های موضوعی پزشکی «مش»، «اصطلاحنامه وتوصیفگرهای پزشکی فارسی» و «هستی‌شناسی بیماری‌ها» و «هستی‌شناسی زیست‌پزشکی» نشان داد که میانگین دقت مفاهیم و میانگین دقت مکانی مفاهیم بیش از 70 درصد با هستی‌شناسی‌های بازنمایی‌شده در پایگاه‌های معتبر هستی‌شناسی بیماری‌های انسانی انطباق داشته و به‌طور میانگین بیش از 30 درصد واژگان جدید برای افزودن به دامنه را فراهم کرده است.

کلیدواژه‌ها

هستی‌شناسی

متن‌کاوی

بازنمایی دانش

«گلوکوم»

بیماری‌های انسان

پروتژ

عنوان مقاله English

Feasibility Study of Ontological Development Using Semi-Automatic Method based on Lexical Frequency Analysis: A Case Study of “glaucoma”

نویسندگان English

Somayeh Tamjid ¹

Fatemeh Nooshinfard ¹

Moluk S. Hoseini Beheshti ²

nadjla Hariri ¹

Fahimeh Babolhavaeji ¹

¹ Islamic Azad University; Science and Research Branch; Tehran, Iran

² Iranian Research Institute for Information Science and Technology (IranDoc); Tehran, Iran

چکیده English

Following recent trends in information management systems, conventional word-based information retrieval methods are changing to concept-based approaches by means of the broad application of ontologies. More specifically, the use of ontologies for knowledge management is significant in the medical sciences and human disease domains due to the diversity and necessity of information sharing between numerous data repositories such as medical records, health record systems, and so on. Furthermore, ontologies make natural language processing approaches more feasible by reducing semantic ambiguity and making concepts comprehensible to computer-based deductions. In this research, a semi-automated approach for ontology development is proposed, which assists in identifying structural components of an ontology and determining possible relations between them based on scientific text records. The proposed approach, in a general view, includes the gathering of a large volume of technical data in text format, processing, and extraction of results with a minimal contribution of human-based supervision. The processing stage is coded in Matlab code named TmbOnt_Alfa and applies two main techniques including word frequency and Lexico-Synactic patterns analysis, to identify concepts and relations, respectively. The role of the human supervisor is narrowed to entering target terms, eliminating unnecessary outputs, and finalizing the ontology structure. In order to evaluate the efficiency of the proposed method, a case study for ontological development in the field of glaucoma has been conducted, and results are compared with medical subject headings of MESH descriptors, the Persian medical thesaurus, ontology of diseases, and Bioassay ontology (BAO).
According to results, the developed ontology, when compared by Glaucoma entry, covered 80% of the medical titles in Mesh, 100% of the medical terms developed in the Persian Medical Thesaurus, and 100% of the Persian medical descriptors. Moreover, the resultant ontology structure is compatible with more than 90% of the same ontology represented in Bioassay and 57% of the ontology of diseases (DO). It also proposed an average of 30% more terms for existing ontological structures.
According to results, the developed ontology, when compared by Glaucoma entry, covered 80% of the medical titles in Mesh, 100% of the medical terms developed in the Persian Medical Thesaurus, and 100% of the Persian medical descriptors. Moreover, the resultant ontology structure is compatible with more than 90% of the same ontology represented in Bioassay and 57% of the ontology of diseases (DO). It also proposed an average of 30% more terms for existing ontological structures.

کلیدواژه‌ها English

Ontology

Text Mining

Information Representation

Glaucoma

Eye Disease

Medical Thesaurus

Protégé

‌احمدی، حمید. ترسیم و تحلیل شبکه مفهومی و هستی‌شناسی ساختار دانش حوزه علم سنجی ایران بر اساس رویکرد تحلیل حوزه. رساله جهت دریافت درجه دکتری. دانشگاه چمران اهواز. 1394.

حسینی بهشتی، ملوک‌السادات. 1392. ساخت‌واژه: اصطلاح‌شناسی و مهندسی دانش. تهران: پژوهشگاه علوم و فناوری اطلاعات ایران؛ چاپار، 1393.

فتحیان دستگردی‌، اکرم. 1389. مقایسه کارآمدی اصطلاحنامه و هستی‌شناسی در بازنمون دانش و بازیابی مفاهیم. پایان‌نامه جهت دریافت درجه کارشناسی ارشد، دانشگاه فردوسی مشهد، دانشکده علوم تربیتی و روان‌شناسی.

معصومی، رحیم، امین معصومی گنجگاه، حبیب اوجاقی، عیسی بنازاده. 1391. توزیع فراوانی علل اختلالات بینائی در افراد بالای 40 سال مراجعه‌کننده به درمانگاه چشم بیمارستان علوی طی سال‌های 85-1384. مجله دانشگاه علوم پزشکی و خدمات بهداشتی درمانی اردبیل 12 (2): 166-172.

References:

Alfonseca, E, & S. Manandhar. 2002. An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery. Proc 1st Int Conf Gen WordNet Mysore India 69: 1–9. https://doi.org/2682189

Babcock, S, J. Beverley, L. G. Cowell, & B. Smith. 2021. The Infectious Disease Ontology in the age of COVID-19. J Biomed Semantics. 2021 Jul 18; 12 (1):13. doi: 10.1186/s13326-021-00245-1. PMID: 34275487; PMCID: PMC8286442. https://pubmed.ncbi.nlm.nih.gov/34275487/

Chaix, E, L. Deléger, R. Bossy, & C. Nédellec. 2019. Text mining tools for extracting information about microbial biodiversity in food. Food Microbiol 81: 63–75. https://doi.org/10.1016/j.fm.2018.04.011

Charlet, J., B. Bachimont, & M. C. Jaulent. 2006. Building medical ontologies by terminology extraction from texts: An experiment for the intensive care units. Comput Biol Med 36: 857–870. https://doi.org/10.1016/j.compbiomed.2005.04.012

Chi, N. W., Y. H. Jin, & S. H. Hsieh. 2019. Developing base domain ontology from a reference collection to aid information retrieval. Autom Constr 100:180–189. https://doi.org/10.1016/j.autcon.2019.01.001

Dutta, B., & M. DeBellis. 2020. CODO: an ontology for collection and analysis of COVID-19 data. arXiv preprint arXiv:2009.01210. https://doi.org/10.48550/arXiv.2009.01210

Fabian, G., T. Wächter, & M. Schroeder. 2012. Extending ontologies by finding siblings using set expansion techniques. 28: 292–300. https://doi.org/10.1093/bioinformatics/bts215

Foster, P. J., R. Buhrmann, H. A. Quigley, & GJ Johnson. 2002 The definition and classification of glaucoma in prevalence surveys. Br J Ophthalmol 86: 238–242. https://doi.org/10.1136/bjo.86.2.238

Hearst, MA. 1992. Automatic acquisition of hyponyms from large text corpora. Proc 14th Conf Comput Linguist 23–28. https://doi.org/https://doi.org/10.3115/992133.992154

Hsieh, SHang-hsieh, Lin Hsien-Tang, Chi NaiWen, kuang wu Chou, and Ken Yu Lin. 2011. Enabling the development of base domain ontology through extraction of knowledge from engineering domain handbooks. Adv Eng Informatics 25: 288–296. https://doi.org/10.1016/j.aei.2010.08.004

Jiang, X, & AH Tan. 2005. Mining ontological knowledge from domain-specific text documents. Proc - IEEE Int Conf Data Mining, ICDM 665–668. https://doi.org/10.1109/ICDM.2005.97

Kibbe, A. Warren, Cesar Arze, Victor Felix, Elvira Mitraka, Evan Bolton, Gang Fu, J. Christopher, Mungall Janos, X Binder, James Malone, Drashtti Vasant, Helen Parkinson, and Lynn M. Schriml. (2015) Disease Ontology 2015 update: An expanded and updated database of Human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res 43: D1071–D1078. https://doi.org/10.1093/nar/gku1011

Kless, Daniel, Ludger Jansen, & S. Milton. 2016. A content-focused method for re-engineering thesauri into semantically adequate ontologies using OWL. Semantic Web, vol. 7, no. 5, pp. 543-576. https://content.iospress.com/articles/semantic-web/sw194

Liu, K, WR Hogan, & RS Crowley. 2011. Natural Language Processing methods and systems for biomedical ontology learning. J Biomed Inform 44: 163–79. https://doi.org/10.1016/j.jbi.2010.07.006

Maedche, A., V. Pekar, & S. Staab. 2003. Ontology Learning Part One — on Discovering Taxonomic Relations from the Web. Web Intell 301–319. https://doi.org/10.1007/978-3-662-05320-1_14

Missikoff, M., P. Velardi, & P. Fabriani. 2003. Text mining techniques to automatically enrich a domain ontology. Appl Intell 18: 323–340. https://doi.org/10.1023/A:1023254205945

Morin, E., & C. Jacquemin. 2004 Automatic acquisition and expansion of hypernym links. Comput Hum 38: 363–396. https://doi.org/10.1007/s10579-004-1926-2

Qian, Wang, Tao Lan, Zhu Lijun. 2007. Approach to ontology construction based on text mining, New Zealand Journal of Agricultural Research 50 (5): 1383-1391. https://doi.org/10.1080/00288230709510426

Soergel, D., Boris Lauser, Anita Liang, & Frehiwot Fisseha. 2004. Reengineering thesauri for New Application: the AGROVOC Example. Journal of Digital Information, vol 4: 4. Article No. 257. https://www.fao.org/3/af234e/af234e.pdf (accessed Nov. 19, 2021)

Tsatsaronis, George, Petrova Alina, Kissa Maria, Yue Ma, Felix Distel, FranzBaader, and Michael Schroeder. 2013. Learning Formal Definitions for Biomedical Concepts. OWLED. https://www.researchgate.net/publication/244484656_Learning_Formal_Definitions_for_Biomedical_Concepts (accessed Aug. 11, 2019)

Wächter T, & M. Schroeder. 2010. Semi-automated ontology generation within OBO-Edit. Bioinformatics 26: 88–96. https://doi.org/10.1093/bioinformatics/btq188

Zhang Xiaodan, Liping Jing, Xiaohua Hu, and Xiaoua Zhou. 2007. A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clustering. Adv Databases Concepts, Syst Appl 115–126. https://doi.org/10.1007/978-3-540-71703-4_12