Few-Shot Multilingual Open-Domain QA from Five Examples

Fan Jiang, Tom Drummond, Trevor Cohn


Abstract
Recent approaches to multilingual open- domain question answering (MLODQA) have achieved promising results given abundant language-specific training data. However, the considerable annotation cost limits the application of these methods for underrepresented languages. We introduce a few-shot learning approach to synthesize large-scale multilingual data from large language models (LLMs). Our method begins with large-scale self-supervised pre-training using WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot supervision. The final model, FsModQA, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval. We further show our method can be extended for effective zero-shot adaptation to new languages through a cross-lingual prompting strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.
Anthology ID:
2025.tacl-1.24
Volume:
Transactions of the Association for Computational Linguistics, Volume 13
Month:
Year:
2025
Address:
Cambridge, MA
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
481–504
Language:
URL:
https://preview.aclanthology.org/corrections-2025-07/2025.tacl-1.24/
DOI:
10.1162/tacl_a_00750
Bibkey:
Cite (ACL):
Fan Jiang, Tom Drummond, and Trevor Cohn. 2025. Few-Shot Multilingual Open-Domain QA from Five Examples. Transactions of the Association for Computational Linguistics, 13:481–504.
Cite (Informal):
Few-Shot Multilingual Open-Domain QA from Five Examples (Jiang et al., TACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-07/2025.tacl-1.24.pdf