Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules

Amr Mohamed, Yang Zhang, Michalis Vazirgiannis, Guokan Shang


Abstract
Diffusion large language models (dLLMs) offer a promising alternative to autoregressive models, but their practical utility is severely hampered by slow, iterative sampling. We present *SchED*, a training-free, model-agnostic early-exit algorithm that terminates diffusion decoding using a progress-aware confidence threshold. We evaluate *SchED* across multiple diffusion model families and a diverse set of benchmarks spanning multiple-choice, math, long-form QA, and translation. *SchED* delivers substantial acceleration: on instruction-tuned models, it achieves approximately speedups while retaining baseline performance on average. On base models, *SchED* yields consistent speedup gains with 99.1–100% performance retention, with up to 2.34× under more aggressive settings. Under a conservative quality–penalized speed metric, *SchED* consistently outperforms prior confidence-based early-exit methods, including on long-form generation where existing approaches tend to break down. An entropy analysis of the model’s token predictions reveals that instruction tuning speeds up the decay of predictive entropy. By leveraging inherent confidence stabilization as a signal for computational efficiency, *SchED* provides a robust framework for efficient dLLM inference.
Anthology ID:
2026.findings-acl.1782
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35793–35807
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1782/
DOI:
Bibkey:
Cite (ACL):
Amr Mohamed, Yang Zhang, Michalis Vazirgiannis, and Guokan Shang. 2026. Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35793–35807, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules (Mohamed et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1782.pdf
Checklist:
 2026.findings-acl.1782.checklist.pdf