Speculative Verification: Exploiting Information Gain for Speculative Decoding
Sungkyun Kim, Jaemin Kim, Dogyeong Yun, Jiho Shin, Junyeol Lee, Jiwon Seo
Abstract
Speculative decoding (SD) improves LLM inference latency by speculatively generating multiple tokens with a small draft model and verifying them with a larger target model. However, when speculation accuracy is low, the overhead from rejected tokens can negate its benefits, especially at large batch sizes.We propose Speculative Verification (SV), an efficient augmentation to SD that predicts speculation accuracy and dynamically adapts the verification length to maximize throughput. SV introduces a small companion model, similar in size to draft model, to reduce uncertainty in speculation accuracy. By exploiting the information gain from observing the companion distribution, SV reduces wasted verification on rejected tokens and improves decoding efficiency.We evaluate SV across publicly available LLMs on seven NLP tasks using over a hundred combinations of draft, companion, and target models, including 13B–72B target models spanning base, instruction-tuned, and task-specific fine-tuned variants. Compared to target-only decoding, standard SD, and state-of-the-art SD variants, SV consistently delivers higher throughput across batch sizes. SV improves SD performance by up to 1.9×, with an average 1.4× speedup at large batch sizes, showing robust and scalable gains for practical LLM inference.- Anthology ID:
- 2026.findings-acl.1809
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 36290–36307
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1809/
- DOI:
- Cite (ACL):
- Sungkyun Kim, Jaemin Kim, Dogyeong Yun, Jiho Shin, Junyeol Lee, and Jiwon Seo. 2026. Speculative Verification: Exploiting Information Gain for Speculative Decoding. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36290–36307, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Speculative Verification: Exploiting Information Gain for Speculative Decoding (Kim et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1809.pdf