Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees

Kun Li; Zenan Xu; Junan Li; Zengrui Jin; Jinghao Deng; Zexuan Qiu; Bo Zhou

Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees

Kun Li, Zenan Xu, Junan Li, Zengrui Jin, Jinghao Deng, Zexuan Qiu, Bo Zhou

Abstract

Tool-Integrated Reasoning has emerged as a key paradigm to augment Large Language Models (LLMs) with computational capabilities, yet integrating tool-use into long Chain-of-Thought (long CoT) remains underexplored, largely due to the scarcity of training data and the challenge of integrating tool-use without compromising the model’s intrinsic long-chain reasoning. In this paper, we introduce DART (Discovery And Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees), a reinforcement learning framework that enables spontaneous tool-use during long CoT reasoning without additional human annotation. DART operates by constructing dynamic rollout trees during training to discover valid tool-use opportunities, branching out at promising positions to explore tool-integrated trajectories. Subsequently, a tree-based process advantage estimation identifies and credits specific sub-trajectories where tool invocation positively contributes to the solution, effectively reinforcing these beneficial behaviors during training. Extensive experiments on challenging benchmarks like AIME and GPQA-Diamond demonstrate that DART significantly outperforms existing methods, successfully harmonizing tool execution with long CoT reasoning.

Anthology ID:: 2026.acl-long.649
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14265–14283
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.649/
DOI:
Bibkey:
Cite (ACL):: Kun Li, Zenan Xu, Junan Li, Zengrui Jin, Jinghao Deng, Zexuan Qiu, and Bo Zhou. 2026. Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14265–14283, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Discovery and Reinforcement of Tool-Integrated Reasoning Chains via Rollout Trees (Li et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.649.pdf
Checklist:: 2026.acl-long.649.checklist.pdf

PDF Cite Search Checklist Fix data