Mitigating Hallucinations in VLMs: Enhancing Visual Attention via Head-Wise Perturbation
Zhenghua Wang, Yixin Wu, Feiran Zhang, Qi Qian, Changze Lv, Xuanjing Huang, Xiaoqing Zheng
Abstract
Vision–Language Models (VLMs) have demonstrated strong capabilities in tasks that require joint understanding of text and images. However, as many VLMs are built upon pre-trained large language models, they often over-rely on linguistic priors at the expense of visual features, causing persistent hallucinations. We observe that these hallucinations stem not only from insufficient visual attention but also from imbalanced activation profiles across attention heads, while hallucinated samples tend to disproportionately activate heads that fail to capture visual cues. To promote a more balanced attention distribution, we propose **HWP**, a strategy that incorporates head-wise attention perturbation via continuous multiplicative noise, coupled with a visual-guided loss focused on vision-sensitive text tokens. Beyond simply strengthening visual grounding, this design encourages a broader set of attention heads to engage with visual signals, thereby alleviating information loss caused by activation concentration on a few dominant heads. Consistent gains across different architectures and scales on multiple benchmarks demonstrate the effectiveness and robustness of our approach in mitigating VLM hallucinations.- Anthology ID:
- 2026.findings-acl.1016
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 20310–20321
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1016/
- DOI:
- Cite (ACL):
- Zhenghua Wang, Yixin Wu, Feiran Zhang, Qi Qian, Changze Lv, Xuanjing Huang, and Xiaoqing Zheng. 2026. Mitigating Hallucinations in VLMs: Enhancing Visual Attention via Head-Wise Perturbation. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20310–20321, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Mitigating Hallucinations in VLMs: Enhancing Visual Attention via Head-Wise Perturbation (Wang et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1016.pdf