Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models

Nikhil Sharma, Kenton Murray, Ziang Xiao


Abstract
Although the multilingual capability of LLMs offers new opportunities to overcome the language barrier, do these capabilities translate into real-life scenarios where linguistic divide and knowledge conflicts between multilingual sources are known occurrences? In this paper, we studied LLM’s linguistic preference in a cross-language RAG-based information search setting. We found that LLMs displayed systemic bias towards information in the same language as the query language in both document retrieval and answer generation. Furthermore, in scenarios where no information is in the language of the query, LLMs prefer documents in high-resource languages during generation, potentially reinforcing the dominant views. Such bias exists for both factual and opinion-based queries. Our results highlight the linguistic divide within multilingual LLMs in information search systems. The seemingly beneficial multilingual capability of LLMs may backfire on information parity by reinforcing language-specific filter bubbles further marginalizing low-resource views.
Anthology ID:
2025.naacl-long.411
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8090–8107
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.411/
DOI:
Bibkey:
Cite (ACL):
Nikhil Sharma, Kenton Murray, and Ziang Xiao. 2025. Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8090–8107, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models (Sharma et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.411.pdf