[Home ] [Archive]   [ فارسی ]  
:: ::
Back to the articles list Back to browse issues page
A Corpus-based study of Persian noun and adjective homographs to help right POS tagging
Elham Alayiaboozar
Assistant Professor Iranian Research Institute for Information Science and Technology (IranDoc)
Abstract:   (322 Views)
Present research studies morphological structure of nouns and adjectives; there are two main reasons for studying them in the process of making any POS tagger system for tagging nouns: 1. If the system faces an out of vocabulary word (OOV word), one way to identify its tag would be considering its morphological structure. 2. In Persian, lots of homographs are made due to Persian complex morphology; studying morphological structure of nouns in order to distinguish them from adjectives seems to be necessary, since many adjectives, having the same orthographic forms of nouns, would be wrongly tagged as “noun” or vice versa. After studying morphological structure of nouns and adjectives in present study, Persian writing system is studied; then definition of homographs and the related classifications are presented. Finally, the study uses different famous Persian corpora (including Bijankhan, and syntactical dependency corpus (vabastegi ye nahvi) for searching for homographs (using search tools) and Data center for Persian language (Paygah e Dadegan) whose non-tagged file was available (the homographs are searched and tagged manually)) to make a list of homographs. The result of studying the mentioned list showed that the frequency of homographs, especially those which are made due to identical orthographic form of indefinite morpheme, adjective-maker morpheme and second person inflectional morpheme is high is Persian corpora which makes POS tagging difficult.
Keywords: POS tagger system, morphological structure of Persian nouns and adjectives, Persian writing system, homographs
Full-Text [PDF 938 kb]   (90 Downloads)    
Type of Study: Research | Subject: Information Technology
Received: 2017/10/14 | Accepted: 2018/02/25 | Published: 2018/03/11
Send email to the article author

Add your comments about this article
Your username or Email:

Write the security code in the box >


XML   Persian Abstract   Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Alayiaboozar E. A Corpus-based study of Persian noun and adjective homographs to help right POS tagging. Journal of Information Processing and Management. 2009;
URL: http://jipm.irandoc.ac.ir/article-1-3740-en.html


Back to the articles list Back to browse issues page
پژوهشنامه پردازش و مدیریت اطلاعات Journal of Information processing and Management
Persian site map - English site map - Created in 0.26 seconds with 30 queries by YEKTAWEB 3647