Mitigating Hallucination in Large Vision-Language Models through Aligning Attention Distribution to Information Flow

Jianfei Zhao; Feng Zhang; Xin Sun; Chong Feng (冯冲)

doi:10.18653/v1/2025.findings-emnlp.1352

Mitigating Hallucination in Large Vision-Language Models through Aligning Attention Distribution to Information Flow

Jianfei Zhao, Feng Zhang, Xin Sun, Chong Feng

Abstract

Due to the unidirectional masking mechanism, Decoder-Only models propagate information from left to right. LVLMs (Large Vision-Language Models) follow the same architecture, with visual information gradually integrated into semantic representations during forward propagation. Through systematic analysis, we observe that over 80% of the visual information is absorbed into the semantic representations. However, the model’s attention still predominantly focuses on the visual representations. This misalignment between the attention distribution and the actual information flow undermines the model’s visual understanding ability and contributes to hallucinations.To address this issue, we enhance the model’s visual understanding by leveraging the core information embedded in semantic representations. Specifically, we identify attention heads that focus on core semantic representations based on their attention distributions. Then, through a two-stage optimization paradigm, we propagate the advantages of these attention heads across the entire model, aligning the attention distribution with the actual information flow.We evaluate our method on three image captioning benchmarks using five different LVLMs,demonstrating its effectiveness in significantly reducing hallucinations. Further experiments reveal a trade-off between reduced hallucinations and richer details. Notably, our method allows for manual adjustment of the model’s conservativeness, enabling flexible control to meet diverse real-world requirements.

Anthology ID:: 2025.findings-emnlp.1352
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24849–24863
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1352/
DOI:: 10.18653/v1/2025.findings-emnlp.1352
Bibkey:
Cite (ACL):: Jianfei Zhao, Feng Zhang, Xin Sun, and Chong Feng. 2025. Mitigating Hallucination in Large Vision-Language Models through Aligning Attention Distribution to Information Flow. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 24849–24863, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Mitigating Hallucination in Large Vision-Language Models through Aligning Attention Distribution to Information Flow (Zhao et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1352.pdf
Checklist:: 2025.findings-emnlp.1352.checklist.pdf

PDF Cite Search Checklist Fix data