One Token Is Enough: Improving Diffusion Language Models with a Sink Token

Zihou Zhang, Zheyong Xie, Li Zhong, Haifeng Liu, Shaosheng Cao


Abstract
Diffusion Language Models (DLMs) have emerged as a compelling alternative to autoregressive approaches, enabling parallel text generation with competitive performance. Despite these advantages, there is a critical instability in DLMs: the moving sink phenomenon. Our analysis indicates that sink tokens exhibit low-norm representations in the Transformer’s value space, and that the moving sink phenomenon serves as a protective mechanism in DLMs to prevent excessive information mixing. However, their unpredictable positions across diffusion steps undermine inference robustness. To resolve this, we propose a simple but effective extra sink token implemented via a modified attention mask. Specifically, we introduce a special token constrained to attend solely to itself, while remaining globally visible to all other tokens. Experimental results demonstrate that introducing a single extra token stabilizes attention sinks, substantially improving model performance. Crucially, further analysis confirms that the effectiveness of this token is independent of its position and characterized by negligible semantic content, validating its role as a robust and dedicated structural sink.
Anthology ID:
2026.findings-acl.323
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6479–6490
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.323/
DOI:
Bibkey:
Cite (ACL):
Zihou Zhang, Zheyong Xie, Li Zhong, Haifeng Liu, and Shaosheng Cao. 2026. One Token Is Enough: Improving Diffusion Language Models with a Sink Token. In Findings of the Association for Computational Linguistics: ACL 2026, pages 6479–6490, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
One Token Is Enough: Improving Diffusion Language Models with a Sink Token (Zhang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.323.pdf
Checklist:
 2026.findings-acl.323.checklist.pdf