AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Zihan Liu; Yang Chen; Mohammad Shoeybi; Bryan Catanzaro; Wei Ping

AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling

Zihan Liu, Yang Chen, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

Abstract

In this paper, we introduce AceMath, a suite of frontier math models that excel in solving complex math problems, along with highly effective reward models capable of evaluating generated solutions and reliably identifying the correct ones. To develop the instruction-tuned math models, we propose a supervised fine-tuning (SFT) process that first achieves competitive performance across general domains, followed by targeted fine-tuning for the math domain using a carefully curated set of prompts and synthetically generated responses. The resulting model, AceMath-72B-Instruct greatly outperforms Qwen2.5-Math-72B-Instruct, GPT-4o and Claude-3.5 Sonnet. To develop math-specialized reward model, we first construct AceMath-RewardBench, a comprehensive and robust benchmark for evaluating math reward models across diverse problems and difficulty levels. After that, we present a systematic approach to build our math reward models. The resulting model, AceMath-72B-RM, consistently outperforms state-of-the-art reward models. Furthermore, when combining AceMath-72B-Instruct with AceMath-72B-RM, we achieve the highest average rm@8 score across the math reasoning benchmarks.

Anthology ID:: 2025.findings-acl.206
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3993–4015
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.206/
DOI:
Bibkey:
Cite (ACL):: Zihan Liu, Yang Chen, Mohammad Shoeybi, Bryan Catanzaro, and Wei Ping. 2025. AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling. In Findings of the Association for Computational Linguistics: ACL 2025, pages 3993–4015, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling (Liu et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.206.pdf

PDF Cite Search Fix data