Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors

Zhiwei Zhang; Fei Zhao; Rui Wang; Zezhong Wang; Bin Liang (梁斌); Jiakang Wang; Yao Hu; Shaosheng Cao; Kam-Fai Wong

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors

Zhiwei Zhang, Fei Zhao, Rui Wang, Zezhong Wang, Bin Liang, Jiakang Wang, Yao Hu, Shaosheng Cao, Kam-Fai Wong

Abstract

Large language models (LLMs) can call tools effectively, yet they remain brittle in multi-turn execution: after a tool-call error, smaller models often fall into repetitive invalid re-invocations instead of interpreting the feedback and recovering. This failure mode persists because current training paradigms do not explicitly teach models how to recover from execution errors. In particular, standard reinforcement learning (RL) collapses rich failure experience into sparse negative rewards, while pre-collected error-correction datasets become mismatched to the policy’s evolving failure modes. To bridge this gap, we propose Fission-GRPO, a framework that converts execution errors into on-policy corrective supervision within the RL training loop. Our core mechanism fissions each failed trajectory into a new training instance by augmenting it with diagnostic feedback from a fine-tuned Error Simulator, then resampling multiple recovery rollouts on-policy. This enables the model to learn from the precise errors it makes during exploration, rather than from static, pre-collected error cases. On BFCL v4 Multi-Turn, Fission-GRPO improves the error recovery rate of Qwen3-8B by 5.7% absolute and overall accuracy by 4.0% (from 42.75% to 46.75%), outperforming both RL baselines and specialized tool-use agents. The method further generalizes to TAU-Bench and TAU2-Bench, achieving leading results across most settings with gains up to +17.4%.

Anthology ID:: 2026.acl-long.1880
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 40477–40491
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1880/
DOI:
Bibkey:
Cite (ACL):: Zhiwei Zhang, Fei Zhao, Rui Wang, Zezhong Wang, Bin Liang, Jiakang Wang, Yao Hu, Shaosheng Cao, and Kam-Fai Wong. 2026. Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 40477–40491, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors (Zhang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1880.pdf
Checklist:: 2026.acl-long.1880.checklist.pdf

PDF Cite Search Checklist Fix data