Introducing a probabilistic–structural method for grapheme-to-phoneme conversion in Persian




Persian writing system deviates from the ideal one due to the lack of one-to-one correspondence between graphemes and phonemes. The present study deals with this question that in spite of the absence of short vowels in Persian writing system and one-to-many and many-to-one relationships between the graphemes and phonemes, how can Persian speakers read out of vocabulary words? This study introduces a probabilistic- structural method that Persian speakers use to read out of vocabulary words in which structural information (including Persian morphology and morphophonemic rules) as well as Arabic morphological templates are considered. In order to test how the introduced method works, Persian speakers were asked to read a list of out of vocabulary words. The mentioned list was used by ID3 and MLP (two methods which are used in machine learning) as input, then the outputs of the method and those of ID3 and MLP were compared with Persian speakers` pronunciations the results proved that the introduced method functions similar to Persian speakers in reading out of vocabulary words.