پرسش و پاسخ محاوره‌ای برای زبان‌های کم‌منبع: یک معماری جدیدِ تقویت‌شده با مدل‌های زبانی بزرگ

محبی, آزاده; آقاداود جلفایی, صفورا

doi:10.22034/jipm.2025.2072373.2101

پرسش و پاسخ محاوره‌ای برای زبان‌های کم‌منبع: یک معماری جدیدِ تقویت‌شده با مدل‌های زبانی بزرگ

نوع مقاله : مقاله پژوهشی

نویسندگان

آزاده محبی ¹

صفورا آقاداود جلفایی ²

¹ استادیار پژوهشگاه علوم و فناوری اطلاعات ایران (ایرانداک)، تهران،

² پژوهشگاه علوم و فناوری اطلاعات ایران (ایرانداک)، تهران،

10.22034/jipm.2025.2072373.2101

چکیده

سامانه‌های پرسش و پاسخ محاوره‌ای با ظهور مدل‌های زبانی بزرگ به‌طور قابل توجهی تکامل یافته‌اند. با این حال، این پیشرفت‌ها به‌طور عمده به‌نفع زبان‌های پرمنبع بوده و زبان‌های کم‌منبع را نادیده گرفته‌اند. این مقاله چارچوب نوین تقویت‌شده با مدل‌های زبان بزرگ را معرفی می‌کند که به‌طور خاص برای پر کردن این شکاف زبانی طراحی شده است. معماری پیشنهادی دارای شش مؤلفه است: «پردازش ورودی» برای مدیریت ویژة زبان، «هسته مدل‌ زبانی بزرگ تطبیق‌پذیر»، «بهبود دانش» برای نگاشت بین زبانی، «مدیریت زمینه» برای راهبری کارآمد محاوره، «تولید پاسخ» همراه با انطباق فرهنگی، و «بازخورد انسانی» برای بهبود مستمر. این چارچوب برخلاف رویکردهای موجود، ملاحظات فرهنگی و زبانی را در سرتاسر فرایندِ پردازش لحاظ می‌کند. برای اعتبارسنجی این چارچوب از یک ارزیابی کیفی از طریق گروه متمرکز متشکل از پنج متخصص پردازش زبان طبیعی استفاده شد. نتایج ارزیابی متخصصان، اثربخشی چارچوب پیشنهادی را در پرداختن به چالش‌های اساسی زبان‌های کم‌منبع، از جمله محدودیت داده، پیچیدگی‌های صرفی، و ظرافت‌های فرهنگی تأیید کرد. متخصصان به‌طور ویژه رویکرد نوآورانه چارچوب در «پردازش فرهنگی یکپارچه»، «کارایی منابع» از طریق مدیریت بهینه زمینه، و «معماری پیمانه‌ای و مقیاس‌پذیر» را به‌عنوان دستاوردهای کلیدی برجسته ساختند.

کلیدواژه‌ها

پرسش و پاسخ محاوره‌ای

درک ماشین

پردازش زبان طبیعی

موضوعات

تعامل انسان و ماشین

عنوان مقاله English

Conversational Question Answering for Low-Resource Languages: A Novel Architecture Enhanced by Large Language Models

نویسندگان English

Azadeh Mohebi ¹

Safoura Aghadavoud Jolfaei ²

¹ Assistant Professor in Iranian Research Institute for Information Science and Technology (IranDoc),Tehran

² , PhD Candidate in Iranian Research Institute for Information Science and Technology (IranDoc), Tehran

چکیده English

Conversational Question Answering (CQA) systems have evolved significantly with the advent of Large Language Models (LLMs). However, these advancements have predominantly favored high-resource languages, often overlooking low-resource ones. This paper introduces a novel LLM-enhanced framework specifically designed to bridge this linguistic gap. The proposed architecture comprises six components: "Input Processing" for language-specific handling, an "Adaptive LLM Core," "Knowledge Enhancement" for cross-lingual mapping, "Context Management" for efficient conversation navigation, "Response Generation" incorporating cultural adaptation, and "Human Feedback" for continuous improvement. Unlike existing approaches, this framework integrates cultural and linguistic considerations throughout the entire processing pipeline. To validate the framework, a qualitative evaluation was conducted using a focus group consisting of five Natural Language Processing (NLP) experts. Expert evaluation results confirmed the proposed framework's effectiveness in addressing fundamental challenges of low-resource languages, including data scarcity, morphological complexities, and cultural nuances. Experts particularly highlighted the framework's innovative approach to "integrated cultural processing," "resource efficiency" via optimized context management, and its "modular and scalable architecture" as key achievements. This research demonstrates that integrating human feedback and cultural adaptation within an efficient architecture offers a practical solution for developing Conversational Question Answering systems in low-resource languages.

کلیدواژه‌ها English

Conversational Question Answering

Interactive Question Answering

Machine Comprehension

Natural Language Processing

References:

Adelani, D. I., j. Abbott, G. Neubig, D. D’souza, J. Kreutzer, C. Lignos, C. Palen-Michel, H. Buzaaba, S. Rijhwani, S. Ruder, S. Mayhew, I. A. Azime, S. H. Muhammad, C. Emezue, J. Nakatumba-Nabende, P. Ogayo, A. Anuoluwapo, C. Gitau, D. Mbaye, … S. Osei. 2021. MasakhaNER: Named Entity Recognition for African Languages. Transactions of the Association for Computational Linguistics 9: 1116–1131. https://doi.org/10.1162/tacl_a_00416

Adelani, D. I., D. Ruiter, O. Jesujoba, D. Adebonojo, A. Ayeni, M. Adeyemi, A. E. Awokoya, and C. España-Bonet. 2021. The Effect of Domain and Diacritics in Yoruba–English Neural Machine Translation. In Proceedings of Machine Translation Summit XVIII: Research Track, pages 61–75, Virtual. Association for Machine Translation in the Americas.

Alam, F., S. A. Chowdhury, S. Boughorbel & M. Hasanain. 2024. LLMs for low resource languages in multilingual, multimodal and dialectal settings. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts (pp. 27-33).

Andersland, M. 2024. Amharic llama and llava: Multimodal llms for low resource languages. arXiv preprint arXiv:2403.06354.

Arslan, M., H. Ghanem, S. Munawar, & C. Cruz. 2024. A Survey on RAG with LLMs. Procedia computer science 246: 3781-3790.

Asai, A., Z. Wu, Y. Wang, A. Sil, & H. Hajishirzi. 2024. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. In B. Kim, Y. Yue, S. Chaudhuri, K. Fragkiadaki, M. Khan, & Y. Sun (Eds.), International Conference on Learning Representations. Vol. 2024, pp. 9112–9141. Vienna Austria.

Azime, I. A., A. L. Tonja, T. Belay, Y. Chanie, B. F. Balcha, N.H. Abadi, H. B. Ademtew, M. Nerea, D. D. Yadeta, D. D. Geremew, A. A. tesfau, P. Slusallek, T. Solorio & D. Klakow. 2025. ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding (arXiv:2411.05049). arXiv. https://doi.org/10.48550/arXiv.2411.05049

Bai, Y., S. Kadavath, S. Kundu, A. Askell, J. Kernion, A. Jones, A. Chen, A. Goldie, A. Mirhoseini, C. McKinnon, C. Chen, C. Olsson, C. Olah, D. Hernandez, D. Drain, D. Ganguli, D. Li, E. Tran-Johnson, E. Perez, … J. Kaplan. 2022. Constitutional AI: Harmlessness from AI Feedback. Anthropic. https://doi.org/2212.08073v1

Brown, T. B. 2020. Language models are few-shot learners. arXiv Preprint arXiv:2005.14165.

Burgin, E., S. Dutta, H. Assem, & R. N. Patel. 2022. Cage: A Hybrid Framework for Closed-Domain Conversational Agents. 636–640.

Cahyawijaya, S., H. Lovenia, & P. Fung. 2024. LLMs Are Few-Shot In-Context Low-Resource Language Learners. In K. Duh, H. Gomez, & S. Bethard (Eds.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Volume 1: Long Papers. (pp. 405–433). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.naacl-long.24. Mexico City, Mexico.

Campos, J. A., K. Cho, A. Otegi, A. Soroa, E. Agirre, & G. Azkune. 2020. Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning. Proceedings of the 28th International Conference on Computational Linguistics, 2561–2571. https://doi.org/10.18653/v1/2020.coling-main.230

Chen, H., X. Liu, D. Yin, & J. Tang. 2017. A Survey on Dialogue Systems: Recent Advances and New Frontiers. SIGKDD Explor. Newsl., 19 (2): 25–35. https://doi.org/10.1145/3166054.3166058

Chen, X., L. Wang, W. Wu, Q. Tang, & Y. Liu. 2024. Honest AI: Fine-Tuning" Small" Language Models to Say" I Don't Know", and Reducing Hallucination in RAG. arXiv preprint arXiv:2410.09699.

Christmann, P., R. Saha Roy, and G. Weikum. 2023. Explainable conversational question answering over heterogeneous sources via iterative graph neural networks. In Proceedings of the 46th international ACM SIGIR conference on research and development in information retrieval (pp. 643-653). Taipei.

Christmann, P., R. Roy, A. Abujabal, J. Singh, and G. Weikum. 2019. Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM '19). Association for Computing Machinery, New York, NY, USA, 729–738. https://doi.org/10.1145/3357384.3358016

Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.

Dam, S. K., C. S. Hong, Y. Qiao, & C. Zhang. 2024. A Complete Survey on LLM-based AI Chatbots (arXiv:2406.16937). arXiv. http://arxiv.org/abs/2406.16937

Deng, Y., W. Lei, M. Huang, & T. S. Chua. 2023. Rethinking Conversational Agents in the Era of LLMs: Proactivity, Non-collaborativity, and Beyond. Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, 298–301. https://doi.org/10.1145/3624918.3629548

Dinan, E., G. Abercrombie, A. S. Bergman, S. Spruit, D. Hovy, Y. L. Boureau, & V. Rieser. 2021. Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2107.03451

Fam, R., & Y. Lepage. 2024. A study of universal morphological analysis using morpheme-based, holistic, and neural approaches under various data size conditions. Annals of Mathematics and Artificial Intelligence. https://doi.org/10.1007/s10472-024-09944-8

Foosherian, M., H. Purwins, P. Rathnayake, Touhidul Alam, Rui Teimao, and Klaus-Dieter Thoben. 2023. Enhancing Pipeline-Based Conversational Agents with Large Language Models. In Proceedings of the 1st Workshop on Taming Large Language Models: Controllability in the era of Interactive Assistants!, pages 56–67, Prague, Czech Republic. Association for Computational Linguistics

Gao, J., M. Galley, & L. Li. 2019. Neural Approaches to Conversational AI. Foundations and Trends® in Information Retrieval 13 (2–3): 127–298. https://doi.org/10.1561/1500000074

Gao, Y., Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, M. Wang, & H. Wang. 2024. Retrieval-Augmented Generation for Large Language Models: A Survey (arXiv:2312.10997). arXiv. http://arxiv.org/abs/2312.10997 (accessed May 18, 2026)

Gemechu, E. A., & G. R. Kanagachidambaresan. 2023. Grammar-aware phrase dataset generated using a novel python package. Data in Brief, 48, 109237. https://doi.org/10.1016/j.dib.2023.109237

Gupta, S., B. P. S. Rawat, & H. Yu. 2020. Conversational machine comprehension: A literature review. arXiv Preprint arXiv:2006.00671.

Gurgurov, D., I. Vykopal, J. Genabith, van, & S. Ostermann. 2025. Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages (arXiv:2502.10140). arXiv. https://doi.org/10.48550/arXiv.2502.10140

Harms, J.-G., P. Kucherbaev, A. Bozzon, & G. J. Houben. 2019. Approaches for Dialog Management in Conversational Agents. IEEE Internet Computing 23 (2): 13–22. https://doi.org/10.1109/MIC.2018.2881519

Hassani, H. 2018. BLARK for multi-dialect languages: towards the Kurdish BLARK. Language Resources and Evaluation 52 (2): 625-644.

Houlsby, N., A. Giurgiu, S. Jastrzebski, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, & S. Gelly. 2019. Parameter-Efficient Transfer Learning for NLP (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1902.00751

Howcroft, D. M., & D. Gkatzia. 2022. Most NLG is Low-Resource: Here’s what we can do about it. Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), 336–350. https://doi.org/10.18653/v1/2022.gem-1.29

Hu, E. J., Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, & W. Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models. CoRR, abs/2106.09685. https://arxiv.org/abs/2106.09685 (accessed May 18, 2026 )

Huyler, D., & C. M. McGill. 2019. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, by John Creswell and J. David Creswell. Thousand Oaks, CA: Sage Publication,. New Horizons in Adult Education and Human Resource Development 31 (3): 75–77. https://doi.org/10.1002/nha3.20258

Ji, K., J. He, & Q. Gu. 2024. Reinforcement Learning from Human Feedback with Active Queries (arXiv:2402.09401). arXiv. http://arxiv.org/abs/2402.09401 (accessed May 18, 2026 )

Jia, X., Z. Mao, Z. Zhang, Q. Lv, X. Wang, & G. Wu. 2025. Syntax-controlled paraphrases generation with VAE and multi-task learning. Computer Speech & Language, 89, 101705. https://doi.org/10.1016/j.csl.2024.101705

Jolfaei, S.A., A. Mohebi. 2025. A review on Persian question answering systems: from traditional to modern approaches. Artificial Intelligence Review 58 (127). https://doi.org/10.1007/s10462-025-11122-z

Joshi, A., R. Dabre, D. Kanojia, Z. Li, H. Zhan, G. Haffari, & D. Dippold. 2025. Natural language processing for dialects of a language: A survey. ACM Computing Surveys 57 (6): 1-37.

Kai, J., S. Hou, Y. Huang, & Z. Lin. 2024. Leveraging Grammar Induction for Language Understanding and Generation In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4501–4513, Miami, Florida, USA. Association for Computational Linguistics.

Kargaran, A. H., A. Imani, F. Yvon, & H. Schütze. 2023. GlotLID: Language Identification for Low-Resource Languages. Findings of the Association for Computational Linguistics: EMNLP 2023, 6155–6218. https://doi.org/10.18653/v1/2023.findings-emnlp.410

Kholodna, N., S. Julka, M. Khodadadi, M. N. Gumus, & M. Granitzer. 2024. LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages. In A. Bifet, T. Krilavičius, I. Miliou, & S. Nowaczyk (Eds.), Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track (Vol. 14950, pp. 397–412). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-70381-2_25

Kongyoung, S., C. Macdonald, I. & Ounis. 2022. monoQA: Multi-Task Learning of Reranking and Answer Extraction for Open-Retrieval Conversational Question Answering In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7207–7218, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.

Kumar, A., H. Shrotriya, P. Sahu, A. Mishra, R.Dabre, R. Puduppully, A. Kunchukuttan, M. Khapra, & P. Kumar. 2022. IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 5363–5394. https://doi.org/10.18653/v1/2022.emnlp-main.360

Kumar, V., & J. Callan. 2020. Making information seeking easier: An improved pipeline for conversational search In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3971–3980, Online. Association for Computational Linguistics.Online

Kusal, S., S. Patil, J. Choudrie, K. Kotecha, S. Mishra, & A. Abraham. 2022. AI-based conversational agents: A scoping review from technologies to future directions. IEEE Access.

Lewis, P., E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W. Yih, & T. Rocktäschel. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33: 9459–9474.

Li, H., L. Jiang, J. D. Hwang, H. Kim, S. Santy, T. Sorensen, ... & Y. Choi. 2024b. Culture-gen: Revealing global cultural perception in language models through natural language prompting. arXiv preprint arXiv:2404.10199.

Li, J., S. Song, S. Yan, G. Hu, C. Lai, & Y. Zhou. 2024a. DANTE: Dialog graph enhanced prompt learning for conversational question answering over KGs. Knowledge-Based Systems, 301: 112294. https://doi.org/10.1016/j.knosys.2024.112294

Li, Z., J. Peng, Y. Wang, T. Shen, M. Zhang, L. Su, S. Wu, Y. Wu, Y. Wang, Y. Wang, W. Hu, J. Li, S. Wang, J. Xiao, & D. Xiong. 2024c. Planning with Large Language Models for Conversational Agents (arXiv:2407.03884). arXiv. http://arxiv.org/abs/2407.03884 (accessed May 18, 2026)

Li, Z., Y. Shi, Z. Liu, F. Yang, A. Payani, N. Liu, & M. Du. 2024d. Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages (arXiv:2404.11553). arXiv. https://doi.org/10.48550/arXiv.2404.11553

Lin, X. V., T. Mihaylov, M. Artetxe, T. Wang, S. Chen, D. Simig, M. Ott, N. Goyal, S. Bhosale, J. Du, R. Pasunuru, S. Shleifer, P. S. Koura, V. Chaudhary, B. O’Horo, J. Wang, L. Zettlemoyer, Z. Kozareva, M. Diab, … X. Li. 2022. Few-shot Learning with Multilingual Language Models In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9019–9052, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.

Liu, L., B. Hill, B. Du, F. Wang, & H. Tong. 2024a. Conversational Question Answering with Language Models Generated Reformulations over Knowledge Graph. Findings of the Association for Computational Linguistics ACL 2024, 839–850. https://doi.org/10.18653/v1/2024.findings-acl.48

Liu, N., L. Chen, X. Tian, W. Zou, K. Chen, & M. Cui. 2024b. From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models (arXiv:2401.02777). arXiv. http://arxiv.org/abs/2401.02777 (accessed)

Madaan, A., N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, S. Gupta, B. Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, & P. Clark. 2023. Self-Refine: Iterative Refinement with Self-Feedback (In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23). Curran Associates Inc., Red Hook, NY, USA, Article 2019, 46534–46594.

Mao, Z., & Y. Yu. 2024. Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages In Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024), pages 1–25, Bangkok, Thailand. Association for Computational Linguistics.

Merx, R., A. Mahmudi, K. Langford, L. Araujo, de, & E. Vylomova. 2024. Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the Mambai Language (arXiv:2404.04809). arXiv. https://doi.org/10.48550/arXiv.2404.04809

Miao, Z., Q. Wu, K. Zhao, Z. Wu, & Y. Tsuruoka. 2024. Enhancing Cross-lingual Sentence Embedding for Low-resource Languages with Word Alignment. Findings of the Association for Computational Linguistics: NAACL 2024, 3225–3236. https://doi.org/10.18653/v1/2024.findings-naacl.204

Nasution, A. H., & A. Onan. 2024. ChatGPT Label: Comparing the Quality of Human-Generated and LLM-Generated Annotations in Low-Resource Language NLP Tasks. IEEE Access, 12, 71876–71900. https://doi.org/10.1109/ACCESS.2024.3402809

Neha, F., D. Bhati, & D. K. Shukla. 2025. Retrieval-Augmented Generation (RAG) in Healthcare: A Comprehensive Review. AI 6 (9): 226.

Nguyen, X.-P., S. M. Aljunied, S. Joty, & L. Bing. 2024. Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts (arXiv:2306.11372). arXiv. https://doi.org/10.48550/arXiv.2306.11372

Nigusie, G., & T. Tegegne. 2022. Amharic Text Complexity Classification Using Supervised Machine Learning. In International Conference on Advances of Science and Technology (pp. 1-12). Cham: Springer Nature Switzerland.

Nooralahzadeh, F., G. Bekoulis, J. Bjerva, & I. Augenstein. 2020. Zero-Shot Cross-Lingual Transfer with Meta Learning In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4547–4562, Online. Association for Computational Linguistics.

Perez-Beltrachini, L., P. Jain, E. Monti, & M. Lapata. 2023. Semantic parsing for conversational question answering over knowledge graphs. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 2507–2522, Dubrovnik, Croatia. Association for Computational Linguistics.

Pfeiffer, J., A. Kamath, A. Rücklé, K. Cho, & I. Gurevych. 2020. AdapterFusion: Non-Destructive Task Composition for Transfer Learning. https://doi.org/10.48550/ARXIV.2005.00247

Pfeiffer, J., I. Vulić, I. Gurevych, & S. Ruder. 2020. MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 7654–7673. https://doi.org/10.18653/v1/2020.emnlp-main.617

Qian, J., B. Zou, M. Dong, X. Li, A. Aw, & Y. Hong. 2022. Capturing conversational interaction for question answering via global history reasoning. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2071–2078, Seattle, United States. Association for Computational Linguistics.

Qin, Y., Z. Cai, D. Jin, L. Yan, S. Liang, K. Zhu, Y. Lin, X. Han, N. Ding, H. Wang, R. Xie, F. Qi, Z. Liu, M. Sun, & J. Zhou. 2023. WebCPM: Interactive Web Search for Chinese Long-form Question Answering. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 8968–8988. https://doi.org/10.18653/v1/2023.acl-long.499

Qu, C., L. Yang, M. Qiu, W. B. Croft, Y. Zhang, & M.Iyyer. 2019. BERT with history answer embedding for conversational question answering. In Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval : 1133–1136. Paris

Rafailov, R., A. Sharma, E. Mitchell, S. Ermon, C. Manning, & C. Finn. 2024. Direct Preference Optimization: Your Language Model is Secretly a Reward Model In Proceedings of the 37th International Conference on Neural Information Processing Systems (NIPS '23). Curran Associates Inc., Red Hook, NY, USA, Article 2338, 53728–53741.

Ringel, D., G. Lavee, I. Guy, & K. Radinsky. 2019. Cross-Cultural Transfer Learning for Text Classification. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3871–3881. https://doi.org/10.18653/v1/D19-1400

Robinson, N. R., P. Ogayo, D. Mortensen, & G. Neubig. 2023. ChatGPT MT: Competitive for High- (but not Low-) Resource Languages (arXiv:2309.07423). arXiv. https://doi.org/10.48550/arXiv.2309.07423

Soares, M. A. C., & F. S. Parreiras. 2020. A literature review on question answering techniques, paradigms and systems. Journal of King Saud University-Computer and Information Sciences 32 (6): 635–646.

Song, Y., T. Wang, P. Cai, S. Mondal, J. & Sahoo. 2023. A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities. ACM Computing Surveys 55 (13s): 1–40. https://doi.org/10.1145/3582688

Soudani, H., E. Kanoulas & F. Hasibi. 2023. Data augmentation for conversational AI. In Companion Proceedings of the ACM Web Conference 2024 (WWW '24). Association for Computing Machinery, New York, NY, USA, 1234–1237. https://doi.org/10.1145/3589335.3641238: 5220–5223.

Su, L., J. Guo, Y. Fan, Y. Lan, R. Zhang, & X. Cheng. 2019. An Adaptive Framework for Conversational Question Answering. Proceedings of the AAAI Conference on Artificial Intelligence 33 (01): 10041–10042. https://doi.org/10.1609/aaai.v33i01.330110041

Sun, K., Z. Li, & H. Zhao. 2021. Multilingual pre-training with universal dependency learning. Advances in Neural Information Processing Systems 34: 8444-8456.

Tan, X., Y. Ren, D. He, T. Qin, Z. Zhao, & T. Y. Liu. 2019. Multilingual Neural Machine Translation with Knowledge Distillation (arXiv:1902.10461). arXiv. http://arxiv.org/abs/1902.10461 (accessed May 18, 2026)

Tanwar, E., S. Dutta, M. Borthakur, & T. Chakraborty. 2023. Multilingual LLMs are Better Cross-lingual In-context Learners with Alignment. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 6292–6307. https://doi.org/10.18653/v1/2023.acl-long.346

Tao, Y., O. Viberg, R. S. Baker, & R. F. Kizilcec. 2024. Cultural bias and cultural alignment of large language models. PNAS nexus 3 (9): 346.

Upadhayay, B., & V. Behzadan. 2024. TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes (arXiv:2311.10797). arXiv. http://arxiv.org/abs/2311.10797 (accessed May 18, 2026)

Vakulenko, S., S. Longpre, Z. Tu, & R. Anantha. 2020. Question Rewriting for Conversational Question Answering. https://arxiv.org/abs/2004.14652 accessed May 18, 2026)

Wei, J., X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, F. Chi, Q. Le, & D. Zhou. 2023. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS '22). Curran Associates Inc., Red Hook, NY, USA, Article 1800, 24824–24837.

Workshop, B., T. Scao, A. Fan, C. Akiki, E. Pavlick, S. Ilić, D. Hesslow, R. Castagné, A. Luccioni, F. Yvon, M. Gallé, J. Tow, A. Rush, S. Biderman, A. Webson, P. Ammanamanchi, T. Wang, B. Sagot, N. Muennighoff, … T. Wolf. 2023. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model (arXiv:2211.05100). arXiv. http://arxiv.org/abs/2211.05100 (accessed May 18, 2026)

Wu, Q., S. Feng, D. Chen, S. Joshi, L. Lastras, & Z. Yu. 2021a. DG2: DG2: Data augmentation through document grounded dialogue generation. In Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 204–216, Edinburgh, UK. Association for Computational Linguistics.

Wu, Z., Y. Luan, H. Rashkin, D. Reitter, H. Hajishirzi, M. Ostendorf, & G. Tomar. 2021b. Conqrr: Conversational query rewriting for retrieval with reinforcement learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10000–10014, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.

Xue, L., N. Constant, A. Roberts, M. Kale, R. Al-Rfou, A. Siddhant, A. Barua, & C. Raffel. 2021. mT5: A massively multilingual pre-trained text-to-text transformer In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.

Yang, L., J. Hu, M. Qiu, C. Qu, J. Gao,W. Croft, X. Liu, Y. Shen, & J. Liu. 2019. A hybrid retrieval-generation neural conversation model. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM '19). Association for Computing Machinery, New York, NY, USA, 1341–1350. https://doi.org/10.1145/3357384.3357881

Yao, S., J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, & Y. Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models Presented at 11th International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda.

Ye, L., Z. Lei, J. Yin, Q.Chen, J. Zhou, & L. He. 2024. Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2301–2305. https://doi.org/10.1145/3626772.3657980

Yong, Z.-X., C. Menghini, & S. H. Bach. 2024. Low-Resource Languages Jailbreak GPT-4 (arXiv:2310.02446). arXiv. https://doi.org/10.48550/arXiv.2310.02446

Zaib, M., W. Zhang, Q. Sheng, A. Mahmood, & Y. Zhang. 2022. Conversational question answering: A survey. Knowledge and Information Systems 64 (12): 3151–3195.

Zhang, H., K. Chen, X. Bai, Y. Xiang, & M. Zhang. 2025. Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment. arXiv preprint arXiv:2509.21798.

Zhao, W. X., K. Zhou, J. Li, Y. Chen, Z. Chen, J. Jiang, R. Ren, Y. Li, Y., X. Tang, Z. Liu, … & J. R.Wen. 2026. A Survey of Large Language Models Frontiers of Computer Science 20, 2012627. https://doi.org/10.1007/s11704-026-60308-3