Constrained Decoding with Speculative Lookaheads

Nishanth Sridhar Nakshatri; Shamik Roy; Rajarshi Das; Suthee Chaidaroon; Leonid Boytsov; Rashmi Gangadharaiah

Constrained Decoding with Speculative Lookaheads

Nishanth Sridhar Nakshatri, Shamik Roy, Rajarshi Das, Suthee Chaidaroon, Leonid Boytsov, Rashmi Gangadharaiah

Abstract

Constrained decoding with lookahead heuristics (CDLH) is a highly effective method for aligning LLM generations to human preferences. However, the extensive lookahead roll-out operations for each generated token makes CDLH prohibitively expensive, resulting in low adoption in practice. In contrast, common decoding strategies such as greedy decoding are extremely efficient, but achieve very low constraint satisfaction. We propose constrained decoding with speculative lookaheads (CDSL), a technique that significantly improves upon the inference efficiency of CDLH without experiencing the drastic performance reduction seen with greedy decoding. CDSL is motivated by the recently proposed idea of speculative decoding that uses a much smaller draft LLM for generation and a larger target LLM for verification. In CDSL, the draft model is used to generate lookaheads which is verified by a combination of target LLM and task-specific reward functions. This process accelerates decoding by reducing the computational burden while maintaining strong performance. We evaluate CDSL in two constraint decoding tasks with three LLM families and achieve 2.2x to 12.15x speedup over CDLH without significant performance reduction.

Anthology ID:: 2025.naacl-long.239
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4681–4700
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.239/
DOI:
Bibkey:
Cite (ACL):: Nishanth Sridhar Nakshatri, Shamik Roy, Rajarshi Das, Suthee Chaidaroon, Leonid Boytsov, and Rashmi Gangadharaiah. 2025. Constrained Decoding with Speculative Lookaheads. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4681–4700, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: Constrained Decoding with Speculative Lookaheads (Nakshatri et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.239.pdf

PDF Cite Search Fix data