Hiroaki Kingetsu

2026

Look Before You Leap: A Lookahead Reasoning Quality Gate for Speculative Decoding
Hiroaki Kingetsu | Kaoru Yokoo | Kenji Fukumizu | Manohar Kaul
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

We present a lookahead quality gate (verifier) for speculative decoding for reasoning or chain-of-thought language models. The gate accepts the longest reliable prefix of each k-token lookahead (block-wise) draft. Unlike token-level likelihood search, which is myopic and often rewards verbosity, or tree-level sampling methods that trade accuracy for latency, our approach works at an intermediate granularity. It uses only the base model’s hidden states to compute a geometry-based quality score for each prefix, then accepts the longest prefix whose score exceeds a quantile-calibrated threshold estimated from unlabeled prompts. The method integrates seamlessly with speculative/blockwise decoding and adds minimal runtime overhead, requiring no auxiliary heads, reward models, or finetuning. On math and science benchmarks, it improves accuracy over sampling baselines while achieving 2.6-7.9× faster generation.

Co-authors

Venues

EACL1

Fix author