TokenPenalty: Alleviating Attention Sinks and Positional Decay in LVLMs

Xiaofeng Zhang, Yuanchao Zhu, Qiyan Zhao, Xiaosong Yuan, Jiawei Cao, Xuhang Chen


Abstract
Multimodal large language models (MLLMs) are increasingly deployed in Web-scale applications—such as image search, social media captioning, and e-commerce product description generation—where factual consistency is critical for user trust and content reliability. However, we observe that MLLMs frequently hallucinate in these settings due to two relevant phenomena: the massive activation phenomenon and positional information decay. The former refers to the tendency of attention mechanisms to concentrate on a small set of tokens with extreme activation values in query and key projections, which play indispensable roles in contextual understanding. In our investigation, we discover that perturbing these tokens leads to significant performance drops, highlighting their utmost importance. As for positional information decay, it occurs due to the common rotary position encoding strategy, where the attention to early visual tokens diminishes over time, especially in long-sequence generation tasks, such as image caption. To address these challenges, we propose TokenTruth, a token-level intervention strategy that dynamically suppresses irrelevant visual tokens while preserving key contextual signals. Our method is grounded in an in-depth analysis of massive activations and attention sink behaviors, and introduces a targeted token penalty mechanism that reallocates attention more faithfully toward informative visual regions. Extensive experiments demonstrate that TokenTruth significantly improves factual consistency across various MLLMs on standard image understanding benchmarks.
Anthology ID:
2026.findings-acl.628
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12894–12906
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.628/
DOI:
Bibkey:
Cite (ACL):
Xiaofeng Zhang, Yuanchao Zhu, Qiyan Zhao, Xiaosong Yuan, Jiawei Cao, and Xuhang Chen. 2026. TokenPenalty: Alleviating Attention Sinks and Positional Decay in LVLMs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 12894–12906, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
TokenPenalty: Alleviating Attention Sinks and Positional Decay in LVLMs (Zhang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.628.pdf
Checklist:
 2026.findings-acl.628.checklist.pdf