SPIDE: Serial and Parallel Intertwined Speculative Decoding

Wenru Xu; Peixuan Xu; Ziqi Yang; Ming Hu; Zihui Wang; Jianzhong Qi; Rongshan Yu; Xiaoliang Fan; Cheng Wang

SPIDE: Serial and Parallel Intertwined Speculative Decoding

Wenru Xu, Peixuan Xu, Ziqi Yang, Ming Hu, Zihui Wang, Jianzhong Qi, Rongshan Yu, Xiaoliang Fan, Cheng Wang

Abstract

Speculative Decoding (SD) reduces inference latency for Large Language Models (LLMs) by leveraging an efficient draft model to generate candidate tokens, which are subsequently verified by the target model. To enhance acceleration while reducing the LLM usage costs, we propose Serial and Parallel Intertwined Speculative DEcoding (SPIDE) — a novel training-free SD framework that orchestrates dynamic alternation combining serial dynamic drafting with parallel draft verification. We maintain a confidence-acceptance mapping table during the decoding process. In the serial dynamic drafting module, we leverage this table to evaluate the reliability of the draft sequence and adjust draft lengths adaptively. In the parallel draft verification module, we alleviate drafting-termination conflicts that compromise efficiency, and we update the mapping table synchronously. We conduct experimental evaluations on diverse model pairs and text generation tasks to assess the effectiveness of SPIDE. Compared with autoregressive decoding, SPIDE is speeded up by 3.25× on average and up to 4.56×. Compared with vanilla SD, SPIDE only increases the LLM usage cost by 8.2% on average, but brings an additional 67.7% speedup on average.

Anthology ID:: 2026.findings-acl.1040
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20762–20779
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1040/
DOI:
Bibkey:
Cite (ACL):: Wenru Xu, Peixuan Xu, Ziqi Yang, Ming Hu, Zihui Wang, Jianzhong Qi, Rongshan Yu, Xiaoliang Fan, and Cheng Wang. 2026. SPIDE: Serial and Parallel Intertwined Speculative Decoding. In Findings of the Association for Computational Linguistics: ACL 2026, pages 20762–20779, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SPIDE: Serial and Parallel Intertwined Speculative Decoding (Xu et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1040.pdf
Checklist:: 2026.findings-acl.1040.checklist.pdf

PDF Cite Search Checklist Fix data