تشخیص متن در اسناد فارسی چاپی بر اساس شبکه‌های عصبی بازگشتی

فخرزاده, آزاده; صدیقی, امیرحسین; عشرت آبادی, محمد; اسفندیاری, البرز

doi:10.22034/jipm.2025.2052358.1926

تشخیص متن در اسناد فارسی چاپی بر اساس شبکه‌های عصبی بازگشتی

نوع مقاله : مقاله پژوهشی

نویسندگان

آزاده فخرزاده ¹

امیرحسین صدیقی ²

محمد عشرت آبادی ³

البرز اسفندیاری ⁴

¹ دکتری تخصصی پردازش تصویر کامپیوتری استادیار، پژوهشگاه علوم و فناوری اطلاعات (ایرانداک)، تهران، ایران.

² دکتری تخصصی مهندسی صنایع استادیار، پژوهشگاه علوم و فناوری اطلاعات (ایرانداک) )، تهران، ایران.

³ دستیار پژوهشی، پژوهشگاه علوم و فناوری اطلاعات (ایرانداک) )، تهران، ایران

⁴ دستیار پژوهشی، پژوهشگاه علوم و فناوری اطلاعات (ایرانداک)، تهران، ایران.

10.22034/jipm.2025.2052358.1926

چکیده

تشخیص خودکار متن فارسی به‌دلیل ویژگی‌های یکتای خط فارسی از جمله ساختار پیوسته، اشتراک بالای ویژگی‌های بصری بین حروف، و تنوع بالای نوشتاری حروف با توجه به موقعیت آنان در کلمه‌‎ همواره چالش‌برانگیز بوده است. هدف این پژوهش ارائه یک مدل نویسه‌خوانی نوری است که بتواند اسناد چاپی و علمی فارسی را که شامل پایان‌نامه‌ها، مقالات و کتب فارسی است، به متن قابل ویرایش تبدیل کند. این امر برای برچسب‌گذاری، فهرست‌بندی و بازیابی اطلاعات در پایگاه داده‌ها یک ضرورت محسوب می‌شود. این مقاله رویکردی ترکیبی مبتنی ‌بر معماری‌های یادگیری عمیق برای تشخیص متن فارسی ارائه می‌دهد. در این روش از شبکه‌های عصبی پیچشی برای استخراج ویژگی‌ها و از شبکه‌های عصبی بازگشتی برای تشخیص کلمات استفاده ‌می‌شود. مزیت اصلی این مدل، توانایی آن در تشخیص مستقیم متن چاپی فارسی بدون نیاز به پیش‌پردازش‌های پیچیده مانند ناحیه‌بندی حروف است. مدل پیشنهادی با استفاده از یک مجموعه داده اختصاصی و بزرگ، شامل بیش از دو میلیون نمونه که با پنج فونت متداول فارسی تولید شده‌، آموزش داده شده است. مدل معرفی‌شده دقت 81 درصد در تشخیص حروف فارسی و 60 درصد در تشخیص کلمات دارد. عمده‌ترین خطاها در کلمات مرتبط با نیم‌فاصله و علائم بود.

کلیدواژه‌ها

تشخیص کاراکتر نوری، حافظه طولانی کوتاه‌مدت، شبکه‌ عصبی بازگشتی، شبکه عصبی پیچشی

موضوعات

فناوری و اطلاعات

عنوان مقاله English

Text Recognition in Printed Persian Documents Based on Recurrent Neural Networks

نویسندگان English

Azadeh Fakhrzdaeh ¹

Amir Hossein Seddighi ²

Mohammad Eshratabadi ³

Alborz Esfandyari ⁴

¹ PhD in Digital image processing; Uppsala University Assistant professor; Information Technology Research Department ,Iranian Research Institute for Information Science & Technology (IranDoc); Tehran, Iran

² PhD in Industrial engineering; Amirkabir University of Technology, Assistant professor; Information Technology Research Department, Iranian Research Institute for Information Science & Technology (IranDoc); Tehran, Iran

³ B.Sc. in Computer Engineering; Amirkabir University of Technology Research assistant; Information Technology Research Department, Iranian Research Institute for Information Science & Technology (IranDoc); Tehran, Iran

⁴ M.Sc. in Software Engineering; Isfahan University of Technology (IUT) Research assistant; Information Technology Research Department, Iranian Research Institute for Information Science & Technology (IranDoc); Tehran, Iran

چکیده English

Automatic Persian text recognition has always been challenging due to the unique characteristics of the Persian script, including its connected structure, the high visual similarity between letters, and the significant variation in the shape of letters depending on their position within a word. The aim of this research is to develop an optical character recognition (OCR) model capable of converting Persian printed and scientific documents, including theses, articles, and books, into editable texts. Such a model is essential for tasks like labeling, indexing, and information retrieval in databases. This paper proposes a hybrid approach based on deep learning architectures for Persian text recognition. In this method, convolutional neural networks (CNNs) are used for feature extraction and recurrent neural networks (RNNs) for word recognition. The main advantage of this model is its ability to directly recognize Persian printed text without relying on complex preprocessing steps, such as letter segmentation. The proposed model is trained on a large and dedicated dataset, comprising over two million samples generated in five common Persian fonts. The model achieves an accuracy of 81 per cent in recognizing Persian letters and 60 per cent in recognizing words. The most common errors occur in words related to semi-spaces and signs.

کلیدواژه‌ها English

Optical Character Recognition, Long Short-Term Memory, Recurrent Neural Network, Convolutional Neural Network

References:

Alayiaboozar, E. and A. Hojjatpanah. 2022. Steps for creating two Persian specialized corpora. International Journal of Information Science and Management (IJISM) 20 (4): 231-243.

Asadi-zeydabadi, F., A. Afkari-Fahandari, A. Faraji, E. Shabaninia, & H. Nezamabadi-pour. 2023. IDPL-PFOD2: A New Large-Scale Dataset for Printed Farsi Optical Character Recognition. arXiv: 2312.01177

Alkhawaldeh, R.S. 2020. Arabic (Indian) digit handwritten recognition using recurrent transfer deep architecture. Soft Comput. 25: 3131–3141.

Avyodri, R., S. Lukas, & H. Tjahyadi. 2022. Optical Character Recognition (OCR) for Text Recognition and its Post-Processing Method: A Literature Review. In Proceedings of the 1st International Conference on Technology Innovation and Its Applications (ICTIIA), Tangerang, Indonesia, pp. 1–6.

Fasha, M., B. Hammo, N. Obeid, & J. Al Widian. 2020. A Hybrid Deep Learning Model for Arabic Text Recognition. 10.48550/arXiv.2009.01987.

Feng, W., G. Naiyang, L. Yuan, Z. Xiangand, & L. Zhigang, 2017, Audio visual speech recognition with multimodal recurrent neural networks. International Joint Conference on neural networks (IJCNN) May 14: 681-688). IEEE.

Graves A., S. Fern´andez, F. J. Gomez, and J. Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. Proceedings of the 23rd international conference on Machine learning 2006 Jun 25 (pp. 369-376)

Hamida, S., O. E. Gannour, B. Cherradi, O. Hassan, & A. Raihani. 2022. Efficient feature descriptor selection for improved Arabic handwritten words recognition. International Journal of Electrical & Computer Engineering 12 (5): 2088-8708.

Hussain, S., S. Ali, Akram QU. Nastalique. 2015. Segmentation-based approach for Urdu OCR. International Journal on Document Analysis and Recognition (IJDAR), 18, 357–374.

Jain, M., M. Mathew, & C.V. Jawahar. 2017. Unconstrained OCR for urdu using deep CNN-RNN hybrid networks. 2017 4th IAPR Asian Conf. on Pattern Recognition (ACPR), Nanjing, People's Republic of China, 2017, pp. 747–752.

Javed, S.T., S. Hussain, A. Maqbool, S. Asloob, S. Jamil, & H. Moin. 2010. Segmentation free nastalique urdu ocr. World Academy of Science, Engineering and Technology 46: 456-461.

Kay, Anthony. 2007. Tesseract: an Open-Source Optical Character Recognition Engine. Linux Journal 159: 2.

Khosravi, H., and E. Kabir. 2009. Blackboard approach towards integrated Farsi OCR system. International Journal of Document Analysis and Recognition (IJDAR) 12 (1): 2132.

Khosrobeigi, Z., H. Veisi, H. Ahmadi, H. Shabanian. 2020. based post-processing approach A rule- to improve Persian OCR performance. Scientia Iranica 27 (6): 3019-3033. doi: 10.24200/sci.2020.53435.3267

Li, M., T. Lv, J. Chen, L. Cui, Y. Lu, D. Florencio, C. Zhang, Z. Li, and F. Wei. 2021. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. ArXiv. /abs/2109.10282

Mithe R., S. Indalkar, & N. Divekar. 2013. Optical Character Recognition. International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878 2 (1): 72-75.

Mirzaee, M. 2012. Text detection in images for Persian optical character recognition. MSc Thesis, University Of Tehran. Iran.

Momeni, S., & B. Babaali. 2022. Arabic Offline Handwritten Text Recognition with Transformers. Research Square (Research Square). https://doi.org/10.21203/rs.3.rs-2300065/v1

Mori, S., C. Y. Suen, and K. Yamamoto. 1992. Historical review of OCR research and development. Proceedings of the IEEE, 80 (7), 1029–1058. doi:10.1109/5.156468 10.1109/5.156468

Mostafa, A., O. Mohamed, A. Ashraf, A. Elbehery, S. Jamal, G. Khoriba, & A.S. Ghoneim. 2021. OCFormer: A Transformer-Based Model for Arabic Handwritten Text Recognition. In Proceedings of the International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 27; pp. 182–186.

Moudgil, A., S. Singh, & V. Gautam. 2022. Recent Trends in OCR Systems: A Review. In Machine Learning for Edge Computing; CRC Press: Boca Raton, FL, USA.

Naz, S., A. I. Umar, R. Ahmad, I. Siddiqi, S. B. Ahmed, M. I. Razzak, & F. Shafait. 2017. Urdu Nastaliq recognition using convolutional–recursive deep learning. Neurocomputing 243: 80–87.

Peng, X., H. Cao, S. Setlur, V. Govindaraju, & P. Natarajan. 2013. Multilingual OCR research and applications: An overview. In Proceedings of the International Workshop on Multilingual OCR, Washington, DC, USA,;[Kaboudan21] pp. 1–8.

Radwan, M. A., M. I. Khalil, and H. M. Abbas. 2018. Neural networks pipeline for off line machine printed Arabic OCR. Neural Process. Lett. 48 (2): 769–787.

Rahmati M., M. Fateh, M. Rezvani, A. Tajary, V. Abolghasemi. 2020. Printed Persian OCR system using deep learning. IET Image Processing, 14: 3920-3931. https://doi.org/10.1049/iet-ipr.2019.0728

Raj, R., & A. Kos. 2022. A Comprehensive Study of Optical Character Recognition. In Proceedings of the 29th International Conference on Mixed Design of Integrated Circuits and System (MIXDES), Lodz, Poland, 25–27 June 2022; pp. 151–154.

Shi, B., X. Bai, and C. Yao. 2017. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (11): 2298-2304.

Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention Is All You Need. ArXiv. /abs/1706.03762

Zand, M., A. Naghsh Nilchi, & S.A Monadjemi. 2008. Recognition-based segmentation in Persian character recognition, Proceedings of World Academy of Science, Engineering and Technology. International Journal of Computer, Electrical, Automation, Control and Information Engineering 2 (1): 14-18.