Speculative Verification: Exploiting Information Gain for Speculative Decoding

Sungkyun Kim, Jaemin Kim, Dogyeong Yun, Jiho Shin, Junyeol Lee, Jiwon Seo


Abstract
Speculative decoding (SD) improves LLM inference latency by speculatively generating multiple tokens with a small draft model and verifying them with a larger target model. However, when speculation accuracy is low, the overhead from rejected tokens can negate its benefits, especially at large batch sizes.We propose Speculative Verification (SV), an efficient augmentation to SD that predicts speculation accuracy and dynamically adapts the verification length to maximize throughput. SV introduces a small companion model, similar in size to draft model, to reduce uncertainty in speculation accuracy. By exploiting the information gain from observing the companion distribution, SV reduces wasted verification on rejected tokens and improves decoding efficiency.We evaluate SV across publicly available LLMs on seven NLP tasks using over a hundred combinations of draft, companion, and target models, including 13B–72B target models spanning base, instruction-tuned, and task-specific fine-tuned variants. Compared to target-only decoding, standard SD, and state-of-the-art SD variants, SV consistently delivers higher throughput across batch sizes. SV improves SD performance by up to 1.9×, with an average 1.4× speedup at large batch sizes, showing robust and scalable gains for practical LLM inference.
Anthology ID:
2026.findings-acl.1809
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36290–36307
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1809/
DOI:
Bibkey:
Cite (ACL):
Sungkyun Kim, Jaemin Kim, Dogyeong Yun, Jiho Shin, Junyeol Lee, and Jiwon Seo. 2026. Speculative Verification: Exploiting Information Gain for Speculative Decoding. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36290–36307, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Speculative Verification: Exploiting Information Gain for Speculative Decoding (Kim et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1809.pdf
Checklist:
 2026.findings-acl.1809.checklist.pdf