QDTSynth: Quality-Driven Formal Theorem Synthesis for Enhancing Proving Performance of LLMs

Lei Wang, Ruobing Zuo, Gaolei He, Jianlin Wang, Zhengfeng Yang


Abstract
Automated Theorem Proving is an important and challenging task. Although large language models (LLMs) have demonstrated remarkable potential in mathematical reasoning, their performance in formal theorem proving remains constrained by the scarcity of high-quality supervised fine-tuning (SFT) data. To address this limitation, we propose a **Q**uality-**D**riven **T**heorem **S**ynthesis method (QDTSynth) in Lean4. During the statement synthesis, we enhance Monte Carlo Tree Search (MCTS) with an adaptive adjustment mechanism that dynamically optimizes the search strategy based on the synthesis of statements. In addition, we propose diversity screening and the self-assessment method to select theorems that exhibit both diversity and high quality from the initially synthetic statements, enabling the synthesis of a high-quality Lean4 theorem dataset. After fine-tuning three open-source large language models on our synthetic dataset, experiments on the miniF2F benchmark demonstrate that QDTSynth significantly improves the performance of various open-source LLMs in theorem proving tasks. Our work offers a promising new direction for the future synthesis of high-quality formal mathematical theorems.
Anthology ID:
2025.acl-long.714
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14683–14698
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.714/
DOI:
Bibkey:
Cite (ACL):
Lei Wang, Ruobing Zuo, Gaolei He, Jianlin Wang, and Zhengfeng Yang. 2025. QDTSynth: Quality-Driven Formal Theorem Synthesis for Enhancing Proving Performance of LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14683–14698, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
QDTSynth: Quality-Driven Formal Theorem Synthesis for Enhancing Proving Performance of LLMs (Wang et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.714.pdf