Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding

Shunqi Mao, Chaoyi Zhang, Weidong Cai


Abstract
Existing vision-language models (VLMs) often suffer from visual hallucination, where the generated responses contain inaccuracies that are not grounded in the visual input. Efforts to address this issue without model finetuning primarily mitigate hallucination by contrastively reducing language biases or amplifying the weights of visual embedding during decoding. However, these approaches remain limited in their ability to capture fine-grained visual details. In this work, we propose the Perception Magnifier (PM), a novel visual decoding method that iteratively isolates relevant visual tokens based on attention and magnifies the corresponding regions, spurring the model to concentrate on fine-grained visual details during decoding. By magnifying critical regions while preserving the structural and contextual information at each decoding step, PM allows the VLM to enhance its scrutiny of the visual input, hence producing more accurate and faithful responses. Extensive experimental results demonstrate that PM not only achieves superior hallucination mitigation but also enhances language generation while preserving strong reasoning capabilities.
Anthology ID:
2026.acl-long.2059
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
44480–44501
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2059/
DOI:
Bibkey:
Cite (ACL):
Shunqi Mao, Chaoyi Zhang, and Weidong Cai. 2026. Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 44480–44501, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding (Mao et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.2059.pdf
Checklist:
 2026.acl-long.2059.checklist.pdf