Kiran Purohit
2026
From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning
Kiran Purohit | Ramasuri Narayanam | Soumyabrata Pal
Findings of the Association for Computational Linguistics: ACL 2026
Kiran Purohit | Ramasuri Narayanam | Soumyabrata Pal
Findings of the Association for Computational Linguistics: ACL 2026
Speculative decoding (SD) accelerates large language model inference by allowing a lightweight draft model to propose outputs that a stronger target model verifies. However, its token-centric nature allows erroneous steps to propagate. Prior approaches mitigate this using external reward models, but incur additional latency, computational overhead, and limit generalizability. We propose SpecGuard, a verification-aware speculative decoding framework that performs step-level verification using only model-internal signals. At each step, SpecGuard samples multiple draft candidates and selects the most consistent step, which is then validated using an ensemble of two lightweight model-internal signals: (i) an attention-based grounding score that measures attribution to the input and previously accepted steps, and (ii) a log-probability-based score that captures token-level confidence. These signals jointly determine whether a step is accepted or recomputed using the target, allocating compute selectively. Experiments across a range of reasoning benchmarks show that SpecGuard improves accuracy by 3.6% while reducing latency by ~11%, outperforming both SD and reward-guided SD.
2024
EXPLORA: Efficient Exemplar Subset Selection for Complex Reasoning
Kiran Purohit | Venktesh V | Raghuram Devalla | Krishna Mohan Yerragorla | Sourangshu Bhattacharya | Avishek Anand
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Kiran Purohit | Venktesh V | Raghuram Devalla | Krishna Mohan Yerragorla | Sourangshu Bhattacharya | Avishek Anand
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Answering reasoning-based complex questions over text and hybrid sources, including tables, is a challenging task. Recent advances in large language models (LLMs) have enabled in-context learning (ICL), allowing LLMs to acquire proficiency in a specific task using only a few demonstration samples (exemplars). A critical challenge in ICL is the selection of optimal exemplars, which can be either task-specific (static) or test-example-specific (dynamic). Static exemplars provide faster inference times and increased robustness across a distribution of test examples. In this paper, we propose an algorithm for static exemplar subset selection for complex reasoning tasks. We introduce EXPLORA, a novel exploration method designed to estimate the parameters of the scoring function, which evaluates exemplar subsets without incorporating confidence information. EXPLORA significantly reduces the number of LLM calls to ~11% of those required by state-of-the-art methods and achieves a substantial performance improvement of 12.24%. We open-source our code and data (https://github.com/kiranpurohit/EXPLORA).