Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Xiao Liang (梁霄); Zhong-Zhi Li; Zhenghao Lin; Eric Hanchen Jiang; Hengyuan Zhang; Yelong Shen; Kai-Wei Chang; Ying Nian Wu; Yeyun Gong; Weizhu Chen

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Xiao Liang, Zhong-Zhi Li, Zhenghao Lin, Eric Hanchen Jiang, Hengyuan Zhang, Yelong Shen, Kai-Wei Chang, Ying Nian Wu, Yeyun Gong, Weizhu Chen

Abstract

Large language models (LLMs) have demonstrated strong reasoning capabilities through step-by-step chain-of-thought (CoT) reasoning. Nevertheless, at the limits of model capability, CoT often proves insufficient, and its strictly sequential nature constrains test-time scalability. A potential alternative is divide-and-conquer (DAC) reasoning, which decomposes a complex problem into subproblems to facilitate more effective exploration of the solution space. Although promising, our analysis reveals a fundamental misalignment between general-purpose post-training and DAC-style inference, which limits the model’s capacity to fully leverage this potential. To bridge this gap and fully unlock LLMs’ reasoning capabilities on the most challenging tasks, we propose an end-to-end reinforcement learning (RL) framework to enhance their DAC-style reasoning capacity. At each step, the policy decomposes a problem into a group of subproblems, solves them sequentially, and addresses the original problem conditioned on the subproblem solutions, with both decomposition and solution integrated into RL training. Under comparable training settings, our DAC-style framework endows the model with a higher performance ceiling and stronger test-time scalability, surpassing CoT by 8.6% in Pass@1 and 6.3% in Pass@32 on competition-level benchmarks. The code is available at the [provided link](https://github.com/MasterVito/DAC-RL).

Anthology ID:: 2026.acl-long.1588
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 34402–34427
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1588/
DOI:
Bibkey:
Cite (ACL):: Xiao Liang, Zhong-Zhi Li, Zhenghao Lin, Eric Hanchen Jiang, Hengyuan Zhang, Yelong Shen, Kai-Wei Chang, Ying Nian Wu, Yeyun Gong, and Weizhu Chen. 2026. Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 34402–34427, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability (Liang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1588.pdf
Checklist:: 2026.acl-long.1588.checklist.pdf

PDF Cite Search Checklist Fix data