DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding

Guanghao Li, Zhihui Fu, Min Fang, Qibin Zhao, Ming Tang, Chun Yuan, Jun Wang


Abstract
Autoregressive (AR) decoding in large language models (LLMs) is latency-bounded by strictly sequential token generation.Speculative decoding mitigates this bottleneck by letting a fast drafter propose multi-token candidates that are then verified in parallel by the target model; yet most existing systems still rely on AR drafters, limiting wall-clock gains.We present **DiffuSpec**, which repurposes a *diffusion language model* (DLM) as a *parallel* drafter to generate multi-token proposals in a single forward pass while remaining compatible with standard AR verifiers.However, DLM drafting presents unique challenges: 1) bidirectional conditioning produces a token lattice where locally optimal tokens may fail to form a valid causal sequence; 2) the mechanism requires tuning the draft length, which induces a speed–quality trade-off. To address these issues, we introduce (i) *Causal-consistency Path Search* (CPS) to extract verifier-aligned causal paths from the lattice, and (ii) an *Adaptive Draft-Length* (ADL) controller that adjusts proposal lengths using online acceptance feedback.Across benchmarks, DiffuSpec achieves up to wall-clock speedup and consistently outperforms strong baselines, demonstrating diffusion-based drafting as a competitive alternative to AR drafters for speculative decoding.
Anthology ID:
2026.findings-acl.1048
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20896–20910
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1048/
DOI:
Bibkey:
Cite (ACL):
Guanghao Li, Zhihui Fu, Min Fang, Qibin Zhao, Ming Tang, Chun Yuan, and Jun Wang. 2026. DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20896–20910, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding (Li et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1048.pdf
Checklist:
 2026.findings-acl.1048.checklist.pdf