Cheng Wang
Other people with similar names: Cheng Wang, Cheng Wang
Unverified author pages with similar names: Cheng Wang
2026
SPIDE: Serial and Parallel Intertwined Speculative Decoding
Wenru Xu | Peixuan Xu | Ziqi Yang | Ming Hu | Zihui Wang | Jianzhong Qi | Rongshan Yu | Xiaoliang Fan | Cheng Wang
Findings of the Association for Computational Linguistics: ACL 2026
Wenru Xu | Peixuan Xu | Ziqi Yang | Ming Hu | Zihui Wang | Jianzhong Qi | Rongshan Yu | Xiaoliang Fan | Cheng Wang
Findings of the Association for Computational Linguistics: ACL 2026
Speculative Decoding (SD) reduces inference latency for Large Language Models (LLMs) by leveraging an efficient draft model to generate candidate tokens, which are subsequently verified by the target model. To enhance acceleration while reducing the LLM usage costs, we propose Serial and Parallel Intertwined Speculative DEcoding (SPIDE) — a novel training-free SD framework that orchestrates dynamic alternation combining serial dynamic drafting with parallel draft verification. We maintain a confidence-acceptance mapping table during the decoding process. In the serial dynamic drafting module, we leverage this table to evaluate the reliability of the draft sequence and adjust draft lengths adaptively. In the parallel draft verification module, we alleviate drafting-termination conflicts that compromise efficiency, and we update the mapping table synchronously. We conduct experimental evaluations on diverse model pairs and text generation tasks to assess the effectiveness of SPIDE. Compared with autoregressive decoding, SPIDE is speeded up by 3.25× on average and up to 4.56×. Compared with vanilla SD, SPIDE only increases the LLM usage cost by 8.2% on average, but brings an additional 67.7% speedup on average.