Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation

Yu Wang, Jiaxin Zhang, Xiang Gao, Wendi Cui, Peng Li, Kamalika Das


Abstract
In tasks such as summarization and open-book question answering (QA), Large Language Models (LLMs) frequently experience “contextual hallucination”, where they generate irrelevant or incorrect responses despite having access to accurate information in the input. This issue often stems from the models’ propensity to prioritize self-generated content over input context, leading to a disregard for pertinent details. To address this challenge, we introduce, Guided Attention Map Editing (GAME), an innovative approach that dynamically adjusts attention maps to enhance contextual relevance. During inference, GAME employs a trained classifier to identify attention maps likely to induce hallucinations and implements targeted interventions. These interventions, guided by gradient-informed “edit directions”, strategically redistribute attention weights across various heads to efficiently mitigate hallucination. Extensive evaluations on challenging summarization and open-book QA tasks demonstrate that GAME consistently and significantly reduces hallucinations across diverse open-source models, thereby improving the reliability and applicability of LLMs.
Anthology ID:
2025.findings-naacl.458
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8206–8217
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-naacl.458/
DOI:
Bibkey:
Cite (ACL):
Yu Wang, Jiaxin Zhang, Xiang Gao, Wendi Cui, Peng Li, and Kamalika Das. 2025. Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 8206–8217, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation (Wang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-naacl.458.pdf