Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
Yuan Sui, Yufei He, Tri Cao, Sophia Simeng Han, Yulin Chen, Bryan Hooi
Abstract
Large Language Models (LLMs) often struggle with computational efficiency and error propagation in multi-step reasoning tasks. While recent advancements on prompting and post-training have enabled LLMs to perform step-wise reasoning, they still tend to explore unproductive solution paths without effective backtracking or strategy adjustment. In this paper, we propose Meta-Reasoner, a new framework that empowers LLMs to “think about how to think”. It optimizes the inference process by dynamically adapting reasoning strategies in real-time. Our approach employs contextual multi-armed bandits (CMABs) to learn an adaptive policy. It learns to evaluate the current state of LLM’s reasoning and determine optimal strategy that is most likely to lead to a successful outcome during inference, like whether to backtrack, switch to a new approach, or restart the problem-solving process. This meta-guidance helps avoid unproductive paths exploration during inference and hence improves computational efficiency. We evaluate Meta-Reasoner on math problems (e.g., Game-of-24, TheoremQA) and scientific tasks (e.g., SciBench). Results show that our method outperform previous SOTA methods by 9-12% in accuracy, while reducing inference time by 28-35% under the same compute budget. Additional experiments on creative writing demonstrate the generalizability of our approach to diverse reasoning-intensive tasks.- Anthology ID:
- 2026.findings-acl.649
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 13268–13286
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.649/
- DOI:
- Cite (ACL):
- Yuan Sui, Yufei He, Tri Cao, Sophia Simeng Han, Yulin Chen, and Bryan Hooi. 2026. Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 13268–13286, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models (Sui et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.649.pdf