Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving

Sara Rajaee, Kumar Pratik, Gabriele Cesa, Arash Behboodi


Abstract
The most promising recent methods for AI reasoning require applying variants of reinforcement learning (RL) either on rolled out trajectories from the LLMs, even for the step-wise rewards, or large quantities of human-annotated trajectory data. The reliance on the rolled-out trajectory renders the compute cost and time prohibitively high. In particular, the correctness of a reasoning trajectory can typically only be judged at its completion, leading to sparse rewards in RL or requiring expensive synthetic data generation in expert iteration-like methods. In this work, we focus on the Automatic Theorem Proving (ATP) task and propose a novel verifier-in-the-loop design, which, unlike existing approaches that leverage feedback on the entire reasoning trajectory, employs an automated verifier to give intermediate feedback at each step of the reasoning process. Using Lean as the verifier, we empirically show that the step-by-step local verification produces a global improvement in the model’s reasoning accuracy and efficiency.
Anthology ID:
2025.findings-acl.825
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16023–16040
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.findings-acl.825/
DOI:
10.18653/v1/2025.findings-acl.825
Bibkey:
Cite (ACL):
Sara Rajaee, Kumar Pratik, Gabriele Cesa, and Arash Behboodi. 2025. Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving. In Findings of the Association for Computational Linguistics: ACL 2025, pages 16023–16040, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving (Rajaee et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.findings-acl.825.pdf