DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding
Guanghao Li, Zhihui Fu, Min Fang, Qibin Zhao, Ming Tang, Chun Yuan, Jun Wang
Abstract
Autoregressive (AR) decoding in large language models (LLMs) is latency-bounded by strictly sequential token generation.Speculative decoding mitigates this bottleneck by letting a fast drafter propose multi-token candidates that are then verified in parallel by the target model; yet most existing systems still rely on AR drafters, limiting wall-clock gains.We present **DiffuSpec**, which repurposes a *diffusion language model* (DLM) as a *parallel* drafter to generate multi-token proposals in a single forward pass while remaining compatible with standard AR verifiers.However, DLM drafting presents unique challenges: 1) bidirectional conditioning produces a token lattice where locally optimal tokens may fail to form a valid causal sequence; 2) the mechanism requires tuning the draft length, which induces a speed–quality trade-off. To address these issues, we introduce (i) *Causal-consistency Path Search* (CPS) to extract verifier-aligned causal paths from the lattice, and (ii) an *Adaptive Draft-Length* (ADL) controller that adjusts proposal lengths using online acceptance feedback.Across benchmarks, DiffuSpec achieves up to 3× wall-clock speedup and consistently outperforms strong baselines, demonstrating diffusion-based drafting as a competitive alternative to AR drafters for speculative decoding.- Anthology ID:
- 2026.findings-acl.1048
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 20896–20910
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1048/
- DOI:
- Cite (ACL):
- Guanghao Li, Zhihui Fu, Min Fang, Qibin Zhao, Ming Tang, Chun Yuan, and Jun Wang. 2026. DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20896–20910, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding (Li et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1048.pdf