XRAG: Cross-lingual Retrieval-Augmented Generation

Wei Liu; Sony Trenous; Leonardo F . R. Ribeiro; Bill Byrne; Felix Hieber

doi:10.18653/v1/2025.findings-emnlp.849

XRAG: Cross-lingual Retrieval-Augmented Generation

Wei Liu, Sony Trenous, Leonardo F. R. Ribeiro, Bill Byrne, Felix Hieber

Abstract

We propose XRAG, a novel benchmark designed to evaluate the generation abilities of LLMs in cross-lingual Retrieval-Augmented Generation (RAG) settings where the user language does not match the retrieval results. XRAG is constructed from recent news articles to ensure that its questions require external know-ledge to be answered. It covers the real-world scenarios of monolingual and multilingual retrieval, and provides relevancy annotations for each retrieved document. Our novel dataset construction pipeline results in questions that require complex reasoning, as evidenced by the significant gap between human and LLM performance. Consequently, XRAG serves as a valuable benchmark for studying LLM reasoning abilities, even before considering the additional cross-lingual complexity. Experimental results on five LLMs uncover two previously unreported challenges in cross-lingual RAG: 1) in the monolingual retrieval setting, all evaluated models struggle with response language correctness; 2) in the multilingual retrieval setting, the main challenge lies in reasoning over retrieved information across languages rather than generation of non-English text.

Anthology ID:: 2025.findings-emnlp.849
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15669–15690
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.849/
DOI:: 10.18653/v1/2025.findings-emnlp.849
Bibkey:
Cite (ACL):: Wei Liu, Sony Trenous, Leonardo F. R. Ribeiro, Bill Byrne, and Felix Hieber. 2025. XRAG: Cross-lingual Retrieval-Augmented Generation. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15669–15690, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: XRAG: Cross-lingual Retrieval-Augmented Generation (Liu et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.849.pdf
Checklist:: 2025.findings-emnlp.849.checklist.pdf

PDF Cite Search Checklist Fix data