RLMEval: Evaluating Research-Level Neural Theorem Proving

Auguste Poiroux; Antoine Bosselut; Viktor Kunčak

doi:10.18653/v1/2025.findings-emnlp.581

RLMEval: Evaluating Research-Level Neural Theorem Proving

Auguste Poiroux, Antoine Bosselut, Viktor Kunčak

Abstract

Despite impressive results on curated benchmarks, the practical impact of large language models (LLMs) on research-level neural theorem proving and proof autoformalization is still limited. We introduce RLMEval, an evaluation suite for these tasks, focusing on research-level mathematics from real-world Lean formalization projects. RLMEval targets the evaluation of neural theorem proving and proof autoformalization on challenging research-level theorems by leveraging real Lean Blueprint formalization projects. Our evaluation of state-of-the-art models on RLMEval, comprising 613 theorems from 6 Lean projects, reveals a significant gap: progress on existing benchmarks does not readily translate to these more realistic settings, with the best model achieving only a 10.3% pass rate. RLMEval provides a new, challenging benchmark designed to guide and accelerate progress in automated reasoning for formal mathematics.

Anthology ID:: 2025.findings-emnlp.581
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10946–10957
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.581/
DOI:: 10.18653/v1/2025.findings-emnlp.581
Bibkey:
Cite (ACL):: Auguste Poiroux, Antoine Bosselut, and Viktor Kunčak. 2025. RLMEval: Evaluating Research-Level Neural Theorem Proving. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 10946–10957, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: RLMEval: Evaluating Research-Level Neural Theorem Proving (Poiroux et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.581.pdf
Checklist:: 2025.findings-emnlp.581.checklist.pdf

PDF Cite Search Checklist Fix data