Feasibility study of ontological development based on semi-automatic method based on lexical frequency analysis: A case study of "glaucoma"



Following recent trends in information management systems, conventional word-based information retrieval methods are changing to concept-based approaches by means of the broad application of ontologies. More specifically, the use of ontologies for knowledge management is significant in the medical sciences and human disease domains due to the diversity and necessity of information sharing between numerous data repositories such as medical records, health record systems, and so on. Furthermore, ontologies make natural language processing approaches more feasible by reducing semantic ambiguity and making concepts comprehensible to computer-based deductions. In this research, a semi-automated approach for ontology development is proposed, which assists in identifying structural components of an ontology and determining possible relations between them based on scientific text records. The proposed approach, in a general view, includes the gathering of a large volume of technical data in text format, processing, and extraction of results with a minimal contribution of human-based supervision. The processing stage is coded in Matlab code named TmbOnt_Alfa and applies two main techniques including word frequency and Lexico-Synactic patterns analysis, to identify concepts and relations, respectively. The role of the human supervisor is narrowed to entering target terms, eliminating unnecessary outputs, and finalizing the ontology structure. In order to evaluate the efficiency of the proposed method, a case study for ontological development in the field of glaucoma has been conducted, and results are compared with medical subject headings of MESH descriptors, the Persian medical thesaurus, ontology of diseases, and Bioassay ontology (BAO).
According to results, the developed ontology, when compared by Glaucoma entry, covered 80% of the medical titles in Mesh, 100% of the medical terms developed in the Persian Medical Thesaurus, and 100% of the Persian medical descriptors. Moreover, the resultant ontology structure is compatible with more than 90% of the same ontology represented in Bioassay and 57% of the ontology of diseases (DO). It also proposed an average of 30% more terms for existing ontological structures.