On-Policy Self-Distillation for Efficient Diffusion Language Models with Early-Stage Calibration

Huaisheng Zhu, MingYu Liu, Junze Liu, Zhen Ge, Tian Wang, Jiri Gesi, Dakuo Wang, Weiqi Zhang, Houyu Zhang, Yufan Guo, Xian Li, Bing Yin, Sujay Sanghavi


Abstract
Diffusion Large Language Models (DLLMs) have recently achieved strong performance, e.g., masked diffusion models (MDMs) can surpass autoregressive models (ARMs) in various tasks. However, DLLMs often struggle with inaccurate early-stage predictions due to limited context, which hinders both the model’s inference efficiency and the output’s overall quality. We propose Calibrated On-Policy Self-Distillation (COPSD) for DLLMs, a simple and efficient method to calibrate early token predictions without requiring demonstration data. COPSD distills an unnormalized target distribution derived from later decoding steps into the original model, enabling more accurate early predictions during inference. Experiments on math, planning, and RLHF tasks show that COPSD improves both effectiveness and efficiency, and further enhances performance when combined with supervised fine-tuning.
Anthology ID:
2026.findings-acl.1344
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26954–26965
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1344/
DOI:
Bibkey:
Cite (ACL):
Huaisheng Zhu, MingYu Liu, Junze Liu, Zhen Ge, Tian Wang, Jiri Gesi, Dakuo Wang, Weiqi Zhang, Houyu Zhang, Yufan Guo, Xian Li, Bing Yin, and Sujay Sanghavi. 2026. On-Policy Self-Distillation for Efficient Diffusion Language Models with Early-Stage Calibration. In Findings of the Association for Computational Linguistics: ACL 2026, pages 26954–26965, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
On-Policy Self-Distillation for Efficient Diffusion Language Models with Early-Stage Calibration (Zhu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1344.pdf
Checklist:
 2026.findings-acl.1344.checklist.pdf