[Home ] [Archive]   [ فارسی ]  
:: Volume 32, Issue 4 (summer 2017) ::
2017, 32(4): 1143-1170 Back to browse issues page
Categorization of Various Essential Datasets and Methods for Textual Spelling Detection and Normalization
Molouk Sadat Hosseini Beheshti , Hadi Abdi Ghavidel
Assistant Professor Iranian Research Institute for Information Science and Technology(IranDoc)
Abstract:   (2753 Views)
One of the most primary phases of automatic text processing is spelling error detection and grapheme normalization. Storing textual documents faces several problems without passing this phase, which causes a disturbance in retrieving the documents automatically. Therefore, specialists in the fields of natural language processing and computational linguistics usually make an attempt to sample various data through presenting ideal methods and algorithms in order to reach the normalized data. Several researches have been conducted on English and some other languages, which have been followed by a certain amount of researches on Farsi too. Sometimes, these several researches have remained to be a pure study and sometimes they have been released as a product. This paper carries out the categorization of the different methods and essential datasets in these researches and depicts each category individually and the evaluation measurements methods generally. Moreover, it describes the performance of the monolingual Farsi systems and the way they meet the Farsi challenges.
Keywords: Spelling Error Detection, Grapheme Normalization, Categorization of the Methods, Monolingual Farsi Systems, Farsi Language Challenges
Full-Text [PDF 812 kb]   (1073 Downloads)    
Type of Study: Review | Subject: Information Technology
Received: 2016/01/26 | Accepted: 2016/08/2 | Published: 2016/08/21
Send email to the article author

Add your comments about this article
Your username or Email:


XML   Persian Abstract   Print

Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Hosseini Beheshti M S, Abdi Ghavidel H. Categorization of Various Essential Datasets and Methods for Textual Spelling Detection and Normalization. Journal of Information Processing and Management. 2017; 32 (4) :1143-1170
URL: http://jipm.irandoc.ac.ir/article-1-3088-en.html

Volume 32, Issue 4 (summer 2017) Back to browse issues page
پژوهشنامه پردازش و مدیریت اطلاعات Journal of Information processing and Management
Persian site map - English site map - Created in 0.27 seconds with 30 queries by YEKTAWEB 3735