Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration
Jucheng Shen, Gaurav Sarkar, Yeonju Ro, Sharath Nittur Sridhar, Zhangyang Wang, Aditya Akella, Souvik Kundu
Abstract
We present CadLLM, a training-free method to accelerate the inference throughput of diffusion-based LLMs (dLLMs). We first investigate on the dynamic nature of the token unmasking confidence across blocks and steps. Based on this observation, we then present a lightweight adaptive approach that can control the generation block size, step size, and threshold based on the average confidence score of the unmasked tokens. We further reduce the softmaxing overhead of token probability generation by dynamically leveraging a subset of vocabulary size to regulate sampling breadth. CadLLM is a plug-and-play model-agnostic with KV caching based dLLMs. Extensive experiments on four popular tasks demonstrate the efficacy of CadLLM to yield throughput improvement of up to 1.1-2.28x over the state-of-the-art baseline with competitive accuracy.- Anthology ID:
- 2026.findings-acl.478
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9826–9837
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.478/
- DOI:
- Cite (ACL):
- Jucheng Shen, Gaurav Sarkar, Yeonju Ro, Sharath Nittur Sridhar, Zhangyang Wang, Aditya Akella, and Souvik Kundu. 2026. Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration. In Findings of the Association for Computational Linguistics: ACL 2026, pages 9826–9837, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration (Shen et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.478.pdf