Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity

Dylan Zhang, Justin Wang, Tianran Sun


Abstract
Existing LMs struggle with proof-oriented programming due to data scarcity, which manifest in two key ways: (1) a lack of sufficient corpora for proof-oriented programming languages such as F*, and (2) the absence of large-scale, project-level proof-oriented implementations that can teach the model the intricate reasoning process when performing proof-oriented programming. We present the first on synthetic data augmentation for project level proof oriented programming for both generation and repair. Our method addresses data scarcity by synthesizing basic proof-oriented programming problems for proficiency in that language; incorporating diverse coding data for reasoning capability elicitation and creating new proofs and repair data within existing repositories. This approach enables language models to both synthesize and repair proofs for function- and repository-level code. We show that our fine-tuned 14B parameter model, PoPilot, can exceed the performance of the models that outperforms GPT-4o in project-level proof-oriented programming by 64% relative margin, and can improve GPT-4o’s performance by 54% by repairing its outputs over GPT-4o’s self-repair.
Anthology ID:
2025.findings-acl.1186
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23101–23118
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1186/
DOI:
Bibkey:
Cite (ACL):
Dylan Zhang, Justin Wang, and Tianran Sun. 2025. Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity. In Findings of the Association for Computational Linguistics: ACL 2025, pages 23101–23118, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarcity (Zhang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1186.pdf