Evaluating LLMs’ Assessment of Mixed-Context Hallucination Through the Lens of Summarization

Siya Qi; Rui Cao; Yulan He; Zheng Yuan

doi:10.18653/v1/2025.findings-acl.847

Evaluating LLMs’ Assessment of Mixed-Context Hallucination Through the Lens of Summarization

Abstract

With the rapid development of large language models (LLMs), LLM-as-a-judge has emerged as a widely adopted approach for text quality evaluation, including hallucination evaluation. While previous studies have focused exclusively on single-context evaluation (e.g., discourse faithfulness or world factuality), real-world hallucinations typically involve mixed contexts, which remains inadequately evaluated. In this study, we use summarization as a representative task to comprehensively evaluate LLMs’ capability in detecting mixed-context hallucinations, specifically distinguishing between factual and non-factual hallucinations. Through extensive experiments across direct generation and retrieval-based models of varying scales, our main observations are: (1) LLMs’ intrinsic knowledge introduces inherent biases in hallucination evaluation; (2) These biases particularly impact the detection of factual hallucinations, yielding a significant performance bottleneck; and (3) the fundamental challenge lies in effective knowledge utilization, balancing between LLMs’ intrinsic knowledge and external context for accurate mixed-context hallucination evaluation.

Anthology ID:: 2025.findings-acl.847
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16480–16503
Language:
URL:: https://preview.aclanthology.org/corrections-2025-08/2025.findings-acl.847/
DOI:: 10.18653/v1/2025.findings-acl.847
Bibkey:
Cite (ACL):: Siya Qi, Rui Cao, Yulan He, and Zheng Yuan. 2025. Evaluating LLMs’ Assessment of Mixed-Context Hallucination Through the Lens of Summarization. In Findings of the Association for Computational Linguistics: ACL 2025, pages 16480–16503, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Evaluating LLMs’ Assessment of Mixed-Context Hallucination Through the Lens of Summarization (Qi et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-08/2025.findings-acl.847.pdf

PDF Cite Search Fix data