Junze Liu
2026
On-Policy Self-Distillation for Efficient Diffusion Language Models with Early-Stage Calibration
Huaisheng Zhu | MingYu Liu | Junze Liu | Zhen Ge | Tian Wang | Jiri Gesi | Dakuo Wang | Weiqi Zhang | Houyu Zhang | Yufan Guo | Xian Li | Bing Yin | Sujay Sanghavi
Findings of the Association for Computational Linguistics: ACL 2026
Huaisheng Zhu | MingYu Liu | Junze Liu | Zhen Ge | Tian Wang | Jiri Gesi | Dakuo Wang | Weiqi Zhang | Houyu Zhang | Yufan Guo | Xian Li | Bing Yin | Sujay Sanghavi
Findings of the Association for Computational Linguistics: ACL 2026
Diffusion Large Language Models (DLLMs) have recently achieved strong performance, e.g., masked diffusion models (MDMs) can surpass autoregressive models (ARMs) in various tasks. However, DLLMs often struggle with inaccurate early-stage predictions due to limited context, which hinders both the model’s inference efficiency and the output’s overall quality. We propose Calibrated On-Policy Self-Distillation (COPSD) for DLLMs, a simple and efficient method to calibrate early token predictions without requiring demonstration data. COPSD distills an unnormalized target distribution derived from later decoding steps into the original model, enabling more accurate early predictions during inference. Experiments on math, planning, and RLHF tasks show that COPSD improves both effectiveness and efficiency, and further enhances performance when combined with supervised fine-tuning.