Shaowei Zhang
2025
BackMATH: Towards Backward Reasoning for Solving Math Problems Step by Step
Shaowei Zhang
|
Deyi Xiong
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
Large language models (LLMs) have achieved impressive results in reasoning, particularly in multi-step reasoning tasks. However, when faced with more complex mathematical problems, the performance of LLMs drops significantly. To address this issue, in this paper, we propose a backward reasoning dataset, BackMATH-Data. The dataset comprises approximately 14K backward reasoning problems and 100K reasoning steps. It follows a result-oriented approach, to construct backward reasoning problems by swapping the reasoning results with specific solving conditions in the original problems.Additionally, we introduce Backward-reasoning Process-supervision Reward Model (BackPRM) and BackMATH-LLM. BackPRM supervises the quality of the generated backward reasoning problems, while BackMATH-LLM is designed for mathematical reasoning. BackMATH-LLM is fine-tuned and enhanced through reinforcement learning by supervising the quality of backward reasoning problems and by providing feedback on reasoning steps, thereby improving the mathematical reasoning capabilities of LLMs.Extensive experiments demonstrate that our model achieves an accuracy of 68.1% on the GSM8K dataset and 21.9% on the MATH dataset, exceeding the SOTA by 1.6% and 2.1% respectively.
Debate4MATH: Multi-Agent Debate for Fine-Grained Reasoning in Math
Shaowei Zhang
|
Deyi Xiong
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) have demonstrated impressive performance in reasoning. However, existing data annotation methods usually suffer from high annotation cost and the lack of effective automatic validation. To address these issues, we propose a Fine-grained Multi-Agent Debate framework (FMAD) and MMATH-Data, a dataset created by FMAD, which consists of 46K reasoning steps. By prompting multiple agents to debate, FMAD assesses the contribution of each reasoning step to the final solution, with labels based on the judge’s confidence score and the winner’s position. To facilitate reasoning in math and examine FMAD and MMATH-Data, we further propose two key components: a Multi-Agent Debate Reward Model (MRM) trained on MMATH-Data, which serves as a reward model to provide robust feedback during the optimization process, and MMATH-LLM, a model designed specifically for mathematical reasoning. MMATH-LLM is fine-tuned using reinforcement learning with supervised feedback from MRM, aiming at improving its mathematical reasoning capabilities. Extensive experiments demonstrate that our model achieves 83.4% accuracy on the GSM8K dataset and 45.1% on the MATH dataset, outperforming the state-of-the-art methods by 1.2% and 3.5%, respectively. All data and code will be available soon at GitHub.