Fixing Semantic Blind Spots in Anchor Tokens of dMLLMs

Ruixuan Xu, Jiexi Xu, Qiyan Zhao, Xiaofeng Zhang


Abstract
Recent advances in diffusion-based Multimodal Large Language Models (dMLLMs) offer a compelling alternative to autoregressive counterparts; however, they remain prone to hallucinations. Through information flow analysis on LLaDA-V, we identify two intertwined factors contributing to this issue. First, although the special tokens serve as semantic anchors for aggregating visual information, they simultaneously induce severe attention sinks, excessively consuming the model’s attention budget. Second, the long-range decay inherent in Rotary Position Embedding (RoPE) leads to semantic blind spots, preventing these anchors from uniformly perceiving the entire visual input. Accordingly, our objective is to moderately alleviate the attention sink effect on semantic anchors while enhancing their ability to aggregate global visual information, thereby eliminating semantic blind spots. To this end, we propose Extrinsic Distance-Aware Regularization (EDAR), a training-free decoding strategy that augments the attention key space with a static, distance-aware matrix. This matrix jointly redistributes excessive attention away from anchors and injects absolute positional bias to ensure uniform visual coverage. Experiments on LLaDA-V demonstrate that EDAR effectively eliminates semantic blind spots and achieves state-of-the-art performance on both hallucination-specific and general multimodal benchmarks.
Anthology ID:
2026.findings-acl.966
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19359–19374
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.966/
DOI:
Bibkey:
Cite (ACL):
Ruixuan Xu, Jiexi Xu, Qiyan Zhao, and Xiaofeng Zhang. 2026. Fixing Semantic Blind Spots in Anchor Tokens of dMLLMs. In Findings of the Association for Computational Linguistics: ACL 2026, pages 19359–19374, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Fixing Semantic Blind Spots in Anchor Tokens of dMLLMs (Xu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.966.pdf
Checklist:
 2026.findings-acl.966.checklist.pdf