TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Ruida Wang; Jipeng Zhang; Yizhen Jia; Rui Pan; Shizhe Diao; Renjie Pi; Tong Zhang

doi:10.18653/v1/2024.emnlp-main.667

TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts

Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, Tong Zhang

Abstract

Proving mathematical theorems using computer-verifiable formal languages like Lean significantly impacts mathematical reasoning. One approach to formal theorem proving involves generating complete proofs using Large Language Models (LLMs) based on Natural Language (NL) proofs. However, due to the scarcity of aligned NL and Formal Language (FL) theorem-proving data most modern LLMs exhibit suboptimal performance.This scarcity results in a paucity of methodologies for training LLMs and techniques to fully utilize their capabilities in composing formal proofs. To address these challenges, this paper proposes **TheoremLlama**, an end-to-end framework that trains a general-purpose LLM to be a Lean4 expert. **TheoremLlama** includes NL-FL dataset generation and bootstrapping method to obtain aligned dataset, curriculum learning and block training techniques to train the model, and iterative proof writing method to write Lean4 proofs that work together synergistically.Using the dataset generation method in **TheoremLlama**, we provide *Open Bootstrapped Theorems* (OBT), an NL-FL aligned and bootstrapped dataset. Our novel NL-FL bootstrapping method, where NL proofs are integrated into Lean4 code for training datasets, leverages the NL reasoning ability of LLMs for formal reasoning. The **TheoremLlama** framework achieves cumulative accuracies of 36.48% and 33.61% on MiniF2F-Valid and Test datasets respectively, surpassing the GPT-4 baseline of 22.95% and 25.41%. Our code, model checkpoints, and the generated dataset is published in GitHub

Anthology ID:: 2024.emnlp-main.667
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11953–11974
Language:
URL:: https://preview.aclanthology.org/icon-24-ingestion/2024.emnlp-main.667/
DOI:: 10.18653/v1/2024.emnlp-main.667
Bibkey:
Cite (ACL):: Ruida Wang, Jipeng Zhang, Yizhen Jia, Rui Pan, Shizhe Diao, Renjie Pi, and Tong Zhang. 2024. TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 11953–11974, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts (Wang et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/icon-24-ingestion/2024.emnlp-main.667.pdf

PDF Search Fix data