MARS2: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation

Pengfei Li; Shijie Wang; Fangyuan Li; Yikun Fu; Kaifeng Liu; Kaiyan Zhang; Dazhi Zhang; Yuqiang Li; Biqing Qi; Bowen Zhou

MARS²: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation

Pengfei Li, Shijie Wang, Fangyuan Li, Yikun Fu, Kaifeng Liu, Kaiyan Zhang, Dazhi Zhang, Yuqiang Li, Biqing Qi, Bowen Zhou

Abstract

Reinforcement learning (RL) paradigms have demonstrated strong performance on reasoning-intensive tasks such as code generation. However, limited trajectory diversity often leads to diminishing returns, which constrains the achievable performance ceiling. Search-enhanced RL alleviates this issue by introducing structured exploration, which remains constrained by the single-agent policy priors. Meanwhile, leveraging multiple interacting policies can acquire more diverse exploratory signals, but existing approaches are typically decoupled from structured search. We propose MARS² (Multi-Agent Reinforced Tree-Search Scaling), a unified RL framework in which multiple independently-optimized agents collaborate within a shared tree-structured search environment. MARS² models the search tree as a learnable multi-agent interaction environment, enabling heterogeneous agents to collaboratively generate and refine candidate solutions within a shared search topology. To support effective learning, we introduce a path-level group advantage formulation based on tree-consistent reward shaping, which facilitates effective credit assignment across complex search trajectories. Experiments on code generation benchmarks show that MARS² consistently improves performance across diverse model combinations and training settings, demonstrating the effectiveness of coupling multi-agent collaboration with tree search for enhancing reinforcement learning. Our code is publicly available at https://github.com/TsinghuaC3I/MARTI.

Anthology ID:: 2026.acl-long.1538
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33319–33335
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1538/
DOI:
Bibkey:
Cite (ACL):: Pengfei Li, Shijie Wang, Fangyuan Li, Yikun Fu, Kaifeng Liu, Kaiyan Zhang, Dazhi Zhang, Yuqiang Li, Biqing Qi, and Bowen Zhou. 2026. MARS2: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 33319–33335, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: MARS2: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation (Li et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1538.pdf
Checklist:: 2026.acl-long.1538.checklist.pdf

PDF Cite Search Checklist Fix data