Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration

Jucheng Shen; Gaurav Sarkar; Yeonju Ro; Sharath Nittur Sridhar; Zhangyang Wang; Aditya Akella; Souvik Kundu

Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration

Jucheng Shen, Gaurav Sarkar, Yeonju Ro, Sharath Nittur Sridhar, Zhangyang Wang, Aditya Akella, Souvik Kundu

Abstract

We present CadLLM, a training-free method to accelerate the inference throughput of diffusion-based LLMs (dLLMs). We first investigate on the dynamic nature of the token unmasking confidence across blocks and steps. Based on this observation, we then present a lightweight adaptive approach that can control the generation block size, step size, and threshold based on the average confidence score of the unmasked tokens. We further reduce the softmaxing overhead of token probability generation by dynamically leveraging a subset of vocabulary size to regulate sampling breadth. CadLLM is a plug-and-play model-agnostic with KV caching based dLLMs. Extensive experiments on four popular tasks demonstrate the efficacy of CadLLM to yield throughput improvement of up to 1.1-2.28x over the state-of-the-art baseline with competitive accuracy.

Anthology ID:: 2026.findings-acl.478
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9826–9837
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.478/
DOI:
Bibkey:
Cite (ACL):: Jucheng Shen, Gaurav Sarkar, Yeonju Ro, Sharath Nittur Sridhar, Zhangyang Wang, Aditya Akella, and Souvik Kundu. 2026. Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration. In Findings of the Association for Computational Linguistics: ACL 2026, pages 9826–9837, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration (Shen et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.478.pdf
Checklist:: 2026.findings-acl.478.checklist.pdf

PDF Cite Search Checklist Fix data