Debate4MATH: Multi-Agent Debate for Fine-Grained Reasoning in Math

Shaowei Zhang; Deyi Xiong

Debate4MATH: Multi-Agent Debate for Fine-Grained Reasoning in Math

Abstract

Large language models (LLMs) have demonstrated impressive performance in reasoning. However, existing data annotation methods usually suffer from high annotation cost and the lack of effective automatic validation. To address these issues, we propose a Fine-grained Multi-Agent Debate framework (FMAD) and MMATH-Data, a dataset created by FMAD, which consists of 46K reasoning steps. By prompting multiple agents to debate, FMAD assesses the contribution of each reasoning step to the final solution, with labels based on the judge’s confidence score and the winner’s position. To facilitate reasoning in math and examine FMAD and MMATH-Data, we further propose two key components: a Multi-Agent Debate Reward Model (MRM) trained on MMATH-Data, which serves as a reward model to provide robust feedback during the optimization process, and MMATH-LLM, a model designed specifically for mathematical reasoning. MMATH-LLM is fine-tuned using reinforcement learning with supervised feedback from MRM, aiming at improving its mathematical reasoning capabilities. Extensive experiments demonstrate that our model achieves 83.4% accuracy on the GSM8K dataset and 45.1% on the MATH dataset, outperforming the state-of-the-art methods by 1.2% and 3.5%, respectively. All data and code will be available soon at GitHub.

Anthology ID:: 2025.findings-acl.862
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16810–16824
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.862/
DOI:
Bibkey:
Cite (ACL):: Shaowei Zhang and Deyi Xiong. 2025. Debate4MATH: Multi-Agent Debate for Fine-Grained Reasoning in Math. In Findings of the Association for Computational Linguistics: ACL 2025, pages 16810–16824, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Debate4MATH: Multi-Agent Debate for Fine-Grained Reasoning in Math (Zhang & Xiong, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.862.pdf

PDF Cite Search Fix data