Mitigating Hallucinations in VLMs: Enhancing Visual Attention via Head-Wise Perturbation

Zhenghua Wang, Yixin Wu, Feiran Zhang, Qi Qian, Changze Lv, Xuanjing Huang, Xiaoqing Zheng


Abstract
Vision–Language Models (VLMs) have demonstrated strong capabilities in tasks that require joint understanding of text and images. However, as many VLMs are built upon pre-trained large language models, they often over-rely on linguistic priors at the expense of visual features, causing persistent hallucinations. We observe that these hallucinations stem not only from insufficient visual attention but also from imbalanced activation profiles across attention heads, while hallucinated samples tend to disproportionately activate heads that fail to capture visual cues. To promote a more balanced attention distribution, we propose **HWP**, a strategy that incorporates head-wise attention perturbation via continuous multiplicative noise, coupled with a visual-guided loss focused on vision-sensitive text tokens. Beyond simply strengthening visual grounding, this design encourages a broader set of attention heads to engage with visual signals, thereby alleviating information loss caused by activation concentration on a few dominant heads. Consistent gains across different architectures and scales on multiple benchmarks demonstrate the effectiveness and robustness of our approach in mitigating VLM hallucinations.
Anthology ID:
2026.findings-acl.1016
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20310–20321
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1016/
DOI:
Bibkey:
Cite (ACL):
Zhenghua Wang, Yixin Wu, Feiran Zhang, Qi Qian, Changze Lv, Xuanjing Huang, and Xiaoqing Zheng. 2026. Mitigating Hallucinations in VLMs: Enhancing Visual Attention via Head-Wise Perturbation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20310–20321, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Mitigating Hallucinations in VLMs: Enhancing Visual Attention via Head-Wise Perturbation (Wang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1016.pdf
Checklist:
 2026.findings-acl.1016.checklist.pdf