SPIDE: Serial and Parallel Intertwined Speculative Decoding
Wenru Xu, Peixuan Xu, Ziqi Yang, Ming Hu, Zihui Wang, Jianzhong Qi, Rongshan Yu, Xiaoliang Fan, Cheng Wang
Abstract
Speculative Decoding (SD) reduces inference latency for Large Language Models (LLMs) by leveraging an efficient draft model to generate candidate tokens, which are subsequently verified by the target model. To enhance acceleration while reducing the LLM usage costs, we propose Serial and Parallel Intertwined Speculative DEcoding (SPIDE) — a novel training-free SD framework that orchestrates dynamic alternation combining serial dynamic drafting with parallel draft verification. We maintain a confidence-acceptance mapping table during the decoding process. In the serial dynamic drafting module, we leverage this table to evaluate the reliability of the draft sequence and adjust draft lengths adaptively. In the parallel draft verification module, we alleviate drafting-termination conflicts that compromise efficiency, and we update the mapping table synchronously. We conduct experimental evaluations on diverse model pairs and text generation tasks to assess the effectiveness of SPIDE. Compared with autoregressive decoding, SPIDE is speeded up by 3.25× on average and up to 4.56×. Compared with vanilla SD, SPIDE only increases the LLM usage cost by 8.2% on average, but brings an additional 67.7% speedup on average.- Anthology ID:
- 2026.findings-acl.1040
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 20762–20779
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1040/
- DOI:
- Cite (ACL):
- Wenru Xu, Peixuan Xu, Ziqi Yang, Ming Hu, Zihui Wang, Jianzhong Qi, Rongshan Yu, Xiaoliang Fan, and Cheng Wang. 2026. SPIDE: Serial and Parallel Intertwined Speculative Decoding. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20762–20779, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- SPIDE: Serial and Parallel Intertwined Speculative Decoding (Xu et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1040.pdf