How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM Hallucination

Saad Obaid Ul Islam; Anne Lauscher; Goran Glavaš

How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM Hallucination

Saad Obaid Ul Islam, Anne Lauscher, Goran Glavaš

Abstract

In the age of misinformation, hallucination—the tendency of Large Language Models (LLMs) to generate non-factual or unfaithful responses—represents the main risk for their global utility. Despite LLMs becoming increasingly multilingual, the vast majority of research on detecting and quantifying LLM hallucination are (a) English-centric and (b) focus on machine translation (MT) and summarization, tasks that are less common in realistic settings than open information seeking. In contrast, we aim to quantify the extent of LLM hallucination across languages in knowledge-intensive long-form question answering (LFQA). To this end, we train a multilingual hallucination detection model and conduct a large-scale study across 30 languages and 6 open-source LLM families. We start from an English hallucination detection dataset and rely on MT to translate-train a detection model. We also manually annotate gold data for five high-resource languages; we then demonstrate, for these languages, that the estimates of hallucination rates are similar between silver (LLM-generated) and gold test sets, validating the use of silver data for estimating hallucination rates for other languages. For the final rates estimation, we build open-domain QA dataset for 30 languages with LLM-generated prompts and Wikipedia articles as references. Our analysis shows that LLMs, in absolute terms, hallucinate more tokens in high-resource languages due to longer responses, but that the actual hallucination rates (i.e., normalized for length) seems uncorrelated with the sizes of languages’ digital footprints. We also find that smaller LLMs hallucinate more, and significantly, LLMs with broader language support display higher hallucination rates.

Anthology ID:: 2025.emnlp-main.1481
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 29065–29086
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1481/
DOI:
Bibkey:
Cite (ACL):: Saad Obaid Ul Islam, Anne Lauscher, and Goran Glavaš. 2025. How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM Hallucination. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 29065–29086, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: How Much Do LLMs Hallucinate across Languages? On Realistic Multilingual Estimation of LLM Hallucination (Ul Islam et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1481.pdf
Checklist:: 2025.emnlp-main.1481.checklist.pdf

PDF Cite Search Checklist Fix data