Tool learning via Inference-time Scaling and Cycle Verifier

Xiaobo Liang; Wenjin Xie; Juntao Li; Wanfu Wang; Yibin Chen; Kehai Chen; Min Zhang (张民)

Tool learning via Inference-time Scaling and Cycle Verifier

Xiaobo Liang, Wenjin Xie, Juntao Li, Wanfu Wang, Yibin Chen, Kehai Chen, Min Zhang

Abstract

In inference-time scaling, Chain-of-Thought (CoT) plays a crucial role in enabling large language models (LLMs) to exhibit reasoning capabilities. However, in many scenarios, high-quality CoT data is scarce or even unavailable. In such cases, STaR-like methods can help LLMs synthesize CoT based on user queries and response, but they inevitably suffer from the risk of compounding errors. In this work, we tackle an even more challenging scenario: tool learning in the absence of user queries. We design a data scaling method using back-translation, which establishes an inference cycle to synthesize both user queries and CoT data. To reudce the compounding error of inference time, we introduce two rule-based verifiers to assess the validity of the synthesized CoT data. In particular, the Cycle Verifier facilitates performance improvement by continuously accumulating new data over multiple iterations. Our approach achieves a 75.4% pass rate and a 79.6% win rate using small models (7B) in StableToolBench. Notably, these results are obtained exclusively from self-synthesized high-quality data, without relying on external supervision or expert trajectories for warm-up.

Anthology ID:: 2025.findings-acl.1266
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24658–24671
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.1266/
DOI:
Bibkey:
Cite (ACL):: Xiaobo Liang, Wenjin Xie, Juntao Li, Wanfu Wang, Yibin Chen, Kehai Chen, and Min Zhang. 2025. Tool learning via Inference-time Scaling and Cycle Verifier. In Findings of the Association for Computational Linguistics: ACL 2025, pages 24658–24671, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Tool learning via Inference-time Scaling and Cycle Verifier (Liang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.1266.pdf

PDF Cite Search Fix data