Few-Shot Multilingual Open-Domain QA from Five Examples

Fan Jiang; Tom Drummond; Trevor Cohn

doi:10.1162/tacl_a_00750

Few-Shot Multilingual Open-Domain QA from Five Examples

Abstract

Recent approaches to multilingual open- domain question answering (MLODQA) have achieved promising results given abundant language-specific training data. However, the considerable annotation cost limits the application of these methods for underrepresented languages. We introduce a few-shot learning approach to synthesize large-scale multilingual data from large language models (LLMs). Our method begins with large-scale self-supervised pre-training using WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot supervision. The final model, FsModQA, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval. We further show our method can be extended for effective zero-shot adaptation to new languages through a cross-lingual prompting strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.

Anthology ID:: 2025.tacl-1.24
Volume:: Transactions of the Association for Computational Linguistics, Volume 13
Month:
Year:: 2025
Address:: Cambridge, MA
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 481–504
Language:
URL:: https://preview.aclanthology.org/corrections-2025-07/2025.tacl-1.24/
DOI:: 10.1162/tacl_a_00750
Bibkey:
Cite (ACL):: Fan Jiang, Tom Drummond, and Trevor Cohn. 2025. Few-Shot Multilingual Open-Domain QA from Five Examples. Transactions of the Association for Computational Linguistics, 13:481–504.
Cite (Informal):: Few-Shot Multilingual Open-Domain QA from Five Examples (Jiang et al., TACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-07/2025.tacl-1.24.pdf

PDF Cite Search Fix data