The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs

Denis Janiak; Jakub Binkowski; Albert Sawczyn; Bogdan Gabrys; Ravid Shwartz-Ziv; Tomasz Jan Kajdanowicz

doi:10.18653/v1/2025.emnlp-main.1761

The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs

Denis Janiak, Jakub Binkowski, Albert Sawczyn, Bogdan Gabrys, Ravid Shwartz-Ziv, Tomasz Jan Kajdanowicz

Abstract

Large language models (LLMs) have revolutionized natural language processing, yet their tendency to hallucinate poses serious challenges for reliable deployment. Despite numerous hallucination detection methods, their evaluations often rely on ROUGE, a metric based on lexical overlap that misaligns with human judgments. Through comprehensive human studies, we demonstrate that while ROUGE exhibits high recall, its extremely low precision leads to misleading performance estimates. In fact, several established detection methods show performance drops of up to 45.9% when assessed using human-aligned metrics like LLM-as-Judge. Moreover, our analysis reveals that simple heuristics based on response length can rival complex detection techniques, exposing a fundamental flaw in current evaluation practices. We argue that adopting semantically aware and robust evaluation frameworks is essential to accurately gauge the true performance of hallucination detection methods, ultimately ensuring the trustworthiness of LLM outputs.

Anthology ID:: 2025.emnlp-main.1761
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34716–34733
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.1761/
DOI:: 10.18653/v1/2025.emnlp-main.1761
Bibkey:
Cite (ACL):: Denis Janiak, Jakub Binkowski, Albert Sawczyn, Bogdan Gabrys, Ravid Shwartz-Ziv, and Tomasz Jan Kajdanowicz. 2025. The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34716–34733, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: The Illusion of Progress: Re-evaluating Hallucination Detection in LLMs (Janiak et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.1761.pdf
Checklist:: 2025.emnlp-main.1761.checklist.pdf

PDF Cite Search Checklist Fix data