DEBATE, TRAIN, EVOLVE: Self‐Evolution of Language Model Reasoning

Gaurav Srivastava; Zhenyu Bi; Meng Lu; Xuan Wang

DEBATE, TRAIN, EVOLVE: Self‐Evolution of Language Model Reasoning

Gaurav Srivastava, Zhenyu Bi, Meng Lu, Xuan Wang

Abstract

Large language models (LLMs) have improved significantly in their reasoning through extensive training on massive datasets. However, relying solely on additional data for improvement is becoming increasingly impractical, highlighting the need for models to autonomously enhance their reasoning without external supervision. In this paper, we propose Debate, Train, Evolve (DTE), a novel ground truth-free training framework that uses multi-agent debate traces to evolve a single language model. We also introduce a new prompting strategy Reflect-Critique-Refine, to improve debate quality by explicitly instructing agents to critique and refine their reasoning. Extensive evaluations on seven reasoning benchmarks with six open-weight models show that our DTE framework achieve substantial improvements, with an average accuracy gain of 8.92% on the challenging GSM-PLUS dataset. Furthermore, we observe strong cross-domain generalization, with an average accuracy gain of 5.8% on all other benchmarks, suggesting that our method captures general reasoning capabilities. Our framework code and trained models are publicly available at https://github.com/ctrl-gaurav/Debate-Train-Evolve.

Anthology ID:: 2025.emnlp-main.1666
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32752–32798
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1666/
DOI:
Bibkey:
Cite (ACL):: Gaurav Srivastava, Zhenyu Bi, Meng Lu, and Xuan Wang. 2025. DEBATE, TRAIN, EVOLVE: Self‐Evolution of Language Model Reasoning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 32752–32798, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: DEBATE, TRAIN, EVOLVE: Self‐Evolution of Language Model Reasoning (Srivastava et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1666.pdf
Checklist:: 2025.emnlp-main.1666.checklist.pdf

PDF Cite Search Checklist Fix data