VADE: Visual Attention Guided Hallucination Detection and Elimination

Vishnu Prabhakaran; Purav Aggarwal; Vinay Kumar Verma; Gokul Swamy; Anoop Saladi

VADE: Visual Attention Guided Hallucination Detection and Elimination

Vishnu Prabhakaran, Purav Aggarwal, Vinay Kumar Verma, Gokul Swamy, Anoop Saladi

Abstract

Vision Language Models (VLMs) have achieved significant advancements in complex visual understanding tasks. However, VLMs are prone to hallucinations—generating outputs that lack alignment with visual content. This paper addresses hallucination detection in VLMs by leveraging the visual grounding information encoded in transformer attention maps. We identify three primary challenges in this approach: the elective nature of visual grounding for certain tokens, the high-dimensional and noisy nature of attention maps, and the dynamic sequence length of attention on previous tokens. To address these, we propose VADE, a novel sequence modelling approach to effectively learn complex sequential patterns from high-dimensional and noisy attention maps for fine-grained hallucination detection and mitigation. VADE achieves an average PR-AUC of 80% in hallucination detection on M-HalDetect across four different model architectures and an 5% improvement in hallucination mitigation on MSCOCO.

Anthology ID:: 2025.findings-acl.773
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14949–14965
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.773/
DOI:
Bibkey:
Cite (ACL):: Vishnu Prabhakaran, Purav Aggarwal, Vinay Kumar Verma, Gokul Swamy, and Anoop Saladi. 2025. VADE: Visual Attention Guided Hallucination Detection and Elimination. In Findings of the Association for Computational Linguistics: ACL 2025, pages 14949–14965, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: VADE: Visual Attention Guided Hallucination Detection and Elimination (Prabhakaran et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.773.pdf

PDF Cite Search Fix data