Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch

Yirong Zeng; Xiao Ding; Yutai Hou; Yuxian Wang; Li Du; Juyi Dai; Qiuyang Ding; Duyu Tang; Dandan Tu; Weiwen Liu; Bing Qin (秦兵); Ting Liu (刘挺)

doi:10.18653/v1/2025.findings-emnlp.485

Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch

Yirong Zeng, Xiao Ding, Yutai Hou, Yuxian Wang, Li Du, Juyi Dai, Qiuyang Ding, Duyu Tang, Dandan Tu, Weiwen Liu, Bing Qin, Ting Liu

Abstract

Training tool-augmented LLMs has emerged as a promising approach to enhancing language models’ capabilities for complex tasks. The current supervised fine-tuning paradigm relies on constructing extensive domain-specific datasets to train models. However, this approach often struggles to generalize effectively to unfamiliar or intricate tool-use scenarios. Recently, reinforcement learning (RL) paradigm can endow LLMs with superior reasoning and generalization abilities. In this work, we address a key question: Can the pure RL be used to effectively elicit a model’s intrinsic reasoning capabilities and enhance the tool-agnostic generalization? We propose a dynamic generalization-guided reward design for rule-based RL, which progressively shifts rewards from exploratory to exploitative tool-use patterns. Based on this design, we introduce the Tool-Zero series models. These models are trained to enable LLMs to autonomously utilize general tools by directly scaling up RL from Zero models (i.e., base models without post-training). Experimental results demonstrate that our models achieve over 7% performance improvement compared to both SFT and RL-with-SFT models under the same experimental settings. These gains are consistently replicated across cross-dataset and intra-dataset evaluations, validating the effectiveness and robustness of our methods.

Anthology ID:: 2025.findings-emnlp.485
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9135–9147
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.485/
DOI:: 10.18653/v1/2025.findings-emnlp.485
Bibkey:
Cite (ACL):: Yirong Zeng, Xiao Ding, Yutai Hou, Yuxian Wang, Li Du, Juyi Dai, Qiuyang Ding, Duyu Tang, Dandan Tu, Weiwen Liu, Bing Qin, and Ting Liu. 2025. Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9135–9147, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Tool Zero: Training Tool-Augmented LLMs via Pure RL from Scratch (Zeng et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.485.pdf
Checklist:: 2025.findings-emnlp.485.checklist.pdf

PDF Cite Search Checklist Fix data