Evaluating Large Language Models for Cross-Lingual Retrieval

Longfei Zuo; Pingjun Hong; Oliver Kraus; Barbara Plank; Robert Litschko

doi:10.18653/v1/2025.findings-emnlp.612

Evaluating Large Language Models for Cross-Lingual Retrieval

Longfei Zuo, Pingjun Hong, Oliver Kraus, Barbara Plank, Robert Litschko

Abstract

Multi-stage information retrieval (IR) has become a widely-adopted paradigm in search. While Large Language Models (LLMs) have been extensively evaluated as second-stage reranking models for monolingual IR, a systematic large-scale comparison is still lacking for cross-lingual IR (CLIR). Moreover, while prior work shows that LLM-based rerankers improve CLIR performance, their evaluation setup relies on machine translation (MT) for the first stage. This is not only prohibitively expensive but also prone to error propagation across stages. Our evaluation on passage-level and document-level CLIR reveals that this setup, which we term noisy monolingual IR, is favorable for LLMs. However, LLMs still fail to improve the first-stage ranking if instead produced by multilingual bi-encoders. We further show that pairwise rerankers based on instruction-tuned LLMs perform competitively with listwise rerankers. To the best of our knowledge, we are the first to study the interaction between retrievers and rerankers in two-stage CLIR with LLMs. Our findings reveal that, without MT, current state-of-the-art rerankers fall severely short when directly applied in CLIR.

Anthology ID:: 2025.findings-emnlp.612
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11415–11429
Language:
URL:: https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.612/
DOI:: 10.18653/v1/2025.findings-emnlp.612
Bibkey:
Cite (ACL):: Longfei Zuo, Pingjun Hong, Oliver Kraus, Barbara Plank, and Robert Litschko. 2025. Evaluating Large Language Models for Cross-Lingual Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 11415–11429, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Evaluating Large Language Models for Cross-Lingual Retrieval (Zuo et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.612.pdf
Checklist:: 2025.findings-emnlp.612.checklist.pdf

PDF Cite Search Checklist Fix data