Iranian Journal of Information Processing and Management

Iranian Journal of Information Processing and Management

Feasibility Study of Ontological Development Using Semi-Automatic Method based on Lexical Frequency Analysis: A Case Study of “glaucoma”

Authors
1 Islamic Azad University; Science and Research Branch; Tehran, Iran
2 Iranian Research Institute for Information Science and Technology (IranDoc); Tehran, Iran
Abstract
Following recent trends in information management systems, conventional word-based information retrieval methods are changing to concept-based approaches by means of the broad application of ontologies. More specifically, the use of ontologies for knowledge management is significant in the medical sciences and human disease domains due to the diversity and necessity of information sharing between numerous data repositories such as medical records, health record systems, and so on. Furthermore, ontologies make natural language processing approaches more feasible by reducing semantic ambiguity and making concepts comprehensible to computer-based deductions. In this research, a semi-automated approach for ontology development is proposed, which assists in identifying structural components of an ontology and determining possible relations between them based on scientific text records. The proposed approach, in a general view, includes the gathering of a large volume of technical data in text format, processing, and extraction of results with a minimal contribution of human-based supervision. The processing stage is coded in Matlab code named TmbOnt_Alfa and applies two main techniques including word frequency and Lexico-Synactic patterns analysis, to identify concepts and relations, respectively. The role of the human supervisor is narrowed to entering target terms, eliminating unnecessary outputs, and finalizing the ontology structure. In order to evaluate the efficiency of the proposed method, a case study for ontological development in the field of glaucoma has been conducted, and results are compared with medical subject headings of MESH descriptors, the Persian medical thesaurus, ontology of diseases, and Bioassay ontology (BAO).
According to results, the developed ontology, when compared by Glaucoma entry, covered 80% of the medical titles in Mesh, 100% of the medical terms developed in the Persian Medical Thesaurus, and 100% of the Persian medical descriptors. Moreover, the resultant ontology structure is compatible with more than 90% of the same ontology represented in Bioassay and 57% of the ontology of diseases (DO). It also proposed an average of 30% more terms for existing ontological structures.
According to results, the developed ontology, when compared by Glaucoma entry, covered 80% of the medical titles in Mesh, 100% of the medical terms developed in the Persian Medical Thesaurus, and 100% of the Persian medical descriptors. Moreover, the resultant ontology structure is compatible with more than 90% of the same ontology represented in Bioassay and 57% of the ontology of diseases (DO). It also proposed an average of 30% more terms for existing ontological structures.
Keywords

‌احمدی، حمید. ترسیم و تحلیل شبکه مفهومی و هستی‌شناسی ساختار دانش حوزه علم سنجی ایران بر اساس رویکرد تحلیل حوزه. رساله جهت دریافت درجه دکتری. دانشگاه چمران اهواز. 1394.
حسینی بهشتی، ملوک‌السادات. 1392. ساخت‌واژه: اصطلاح‌شناسی و مهندسی دانش. تهران: پژوهشگاه علوم و فناوری اطلاعات ایران؛ چاپار، 1393.
فتحیان دستگردی‌، اکرم. 1389. مقایسه کارآمدی اصطلاحنامه و هستی‌شناسی در بازنمون دانش و بازیابی مفاهیم. پایان‌نامه جهت دریافت درجه کارشناسی ارشد، دانشگاه فردوسی مشهد، دانشکده علوم تربیتی و روان‌شناسی.
معصومی، رحیم، امین معصومی گنجگاه، حبیب اوجاقی، عیسی بنازاده. 1391. توزیع فراوانی علل اختلالات بینائی در افراد بالای 40 سال مراجعه‌کننده به درمانگاه چشم بیمارستان علوی طی سال‌های 85-1384. مجله دانشگاه علوم پزشکی و خدمات بهداشتی درمانی اردبیل 12 (2): 166-172.
References:
Alfonseca, E, & S. Manandhar. 2002. An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery. Proc 1st Int Conf Gen WordNet Mysore India 69: 1–9. https://doi.org/2682189
Babcock, S, J. Beverley, L. G. Cowell, & B. Smith. 2021. The Infectious Disease Ontology in the age of COVID-19. J Biomed Semantics. 2021 Jul 18; 12 (1):13. doi: 10.1186/s13326-021-00245-1. PMID: 34275487; PMCID: PMC8286442. https://pubmed.ncbi.nlm.nih.gov/34275487/
Chaix, E, L. Deléger, R. Bossy, & C. Nédellec. 2019. Text mining tools for extracting information about microbial biodiversity in food. Food Microbiol 81: 63–75. https://doi.org/10.1016/j.fm.2018.04.011
Charlet, J., B. Bachimont, & M. C. Jaulent. 2006. Building medical ontologies by terminology extraction from texts: An experiment for the intensive care units. Comput Biol Med 36: 857–870. https://doi.org/10.1016/j.compbiomed.2005.04.012
Chi, N. W., Y. H. Jin, & S. H. Hsieh. 2019. Developing base domain ontology from a reference collection to aid information retrieval. Autom Constr 100:180–189. https://doi.org/10.1016/j.autcon.2019.01.001
Dutta, B., & M. DeBellis. 2020. CODO: an ontology for collection and analysis of COVID-19 data. arXiv preprint arXiv:2009.01210. https://doi.org/10.48550/arXiv.2009.01210
Fabian, G., T. Wächter, & M. Schroeder. 2012. Extending ontologies by finding siblings using set expansion techniques. 28: 292–300. https://doi.org/10.1093/bioinformatics/bts215
Foster, P. J., R. Buhrmann, H. A. Quigley, & GJ Johnson. 2002 The definition and classification of glaucoma in prevalence surveys. Br J Ophthalmol 86: 238–242. https://doi.org/10.1136/bjo.86.2.238
Hearst, MA. 1992. Automatic acquisition of hyponyms from large text corpora. Proc 14th Conf Comput Linguist 23–28. https://doi.org/https://doi.org/10.3115/992133.992154
Hsieh, SHang-hsieh, Lin Hsien-Tang, Chi NaiWen, kuang wu Chou, and Ken Yu Lin. 2011. Enabling the development of base domain ontology through extraction of knowledge from engineering domain handbooks. Adv Eng Informatics 25: 288–296. https://doi.org/10.1016/j.aei.2010.08.004
Jiang, X, & AH Tan. 2005. Mining ontological knowledge from domain-specific text documents. Proc - IEEE Int Conf Data Mining, ICDM 665–668. https://doi.org/10.1109/ICDM.2005.97
Kibbe, A. Warren, Cesar Arze, Victor Felix, Elvira Mitraka, Evan Bolton, Gang Fu, J. Christopher, Mungall Janos, X Binder, James Malone, Drashtti Vasant, Helen Parkinson, and Lynn M. Schriml. (2015) Disease Ontology 2015 update: An expanded and updated database of Human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res 43: D1071–D1078. https://doi.org/10.1093/nar/gku1011
Kless, Daniel, Ludger Jansen, & S. Milton. 2016. A content-focused method for re-engineering thesauri into semantically adequate ontologies using OWL. Semantic Web, vol. 7, no. 5, pp. 543-576. https://content.iospress.com/articles/semantic-web/sw194
Liu, K, WR Hogan, & RS Crowley. 2011. Natural Language Processing methods and systems for biomedical ontology learning. J Biomed Inform 44: 163–79. https://doi.org/10.1016/j.jbi.2010.07.006
Maedche, A., V. Pekar, & S. Staab. 2003. Ontology Learning Part One — on Discovering Taxonomic Relations from the Web. Web Intell 301–319. https://doi.org/10.1007/978-3-662-05320-1_14
Missikoff, M., P. Velardi, & P. Fabriani. 2003. Text mining techniques to automatically enrich a domain ontology. Appl Intell 18: 323–340. https://doi.org/10.1023/A:1023254205945
Morin, E., & C. Jacquemin. 2004 Automatic acquisition and expansion of hypernym links. Comput Hum 38: 363–396. https://doi.org/10.1007/s10579-004-1926-2
Qian, Wang, Tao Lan, Zhu Lijun. 2007. Approach to ontology construction based on text mining, New Zealand Journal of Agricultural Research 50 (5): 1383-1391. https://doi.org/10.1080/00288230709510426
Soergel, D., Boris Lauser, Anita Liang, & Frehiwot Fisseha. 2004. Reengineering thesauri for New Application: the AGROVOC Example. Journal of Digital Information, vol 4: 4. Article No. 257. https://www.fao.org/3/af234e/af234e.pdf (accessed Nov. 19, 2021)
Tsatsaronis, George, Petrova Alina, Kissa Maria, Yue Ma, Felix Distel, FranzBaader, and Michael Schroeder. 2013. Learning Formal Definitions for Biomedical Concepts. OWLED. https://www.researchgate.net/publication/244484656_Learning_Formal_Definitions_for_Biomedical_Concepts (accessed Aug. 11, 2019)
Wächter T, & M. Schroeder. 2010. Semi-automated ontology generation within OBO-Edit. Bioinformatics 26: 88–96. https://doi.org/10.1093/bioinformatics/btq188
Zhang Xiaodan, Liping Jing, Xiaohua Hu, and Xiaoua Zhou. 2007. A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clustering. Adv Databases Concepts, Syst Appl 115–126. https://doi.org/10.1007/978-3-540-71703-4_12

  • Receive Date 10 December 2022
  • Revise Date 28 January 2023