Training Verifier to Assessing Complex Real-World Tool-Use Trajectories

Linzhuang Sun; Mingyang Chen; Hao Liang; Tianpeng Li; Zhou Yijie; Chenzheng Zhu; Tianyu Guo; Huanyao Zhang; Jingxuan Wei; Bihui Yu; Fan Yang; Wentao Zhang

Training Verifier to Assessing Complex Real-World Tool-Use Trajectories

Linzhuang Sun, Mingyang Chen, Hao Liang, Tianpeng Li, Zhou Yijie, Chenzheng Zhu, Tianyu Guo, Huanyao Zhang, Jingxuan Wei, Bihui Yu, Fan Yang, Wentao Zhang

Abstract

Training effective AI agents for real-world tool-use interactions requires data that faithfully captures the dynamics of human–agent collaboration. However, such data is scarce, and existing methods often resort to synthetic data generation. The inherently dynamic and complex nature of user–agent interactions makes ensuring data quality particularly challenging. Current verification approaches are typically entangled with the synthesis process itself, resulting in complicated implementations that undermine both reproducibility and scalability. To address this, we introduce Tool-Verifier-7B, a plug-and-play framework for data quality control in tool-use scenarios. Building on this verifier and our data synthesis strategy, we construct the Tool-Verify dataset, which contains 3,295 curated samples. To directly assess verifier performance, we further release Tool-V-Bench, a benchmark of 165 human-validated trajectories spanning diverse interaction complexities. Comprehensive experiments show that Tool-Verifier-7B surpasses Qwen2.5-72B-Instruct on Tool-V-Bench. Moreover, the Tool-Verify dataset achieves superior performance compared to the previous APIGen-MT dataset.

Anthology ID:: 2026.findings-acl.1647
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32922–32936
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1647/
DOI:
Bibkey:
Cite (ACL):: Linzhuang Sun, Mingyang Chen, Hao Liang, Tianpeng Li, Zhou Yijie, Chenzheng Zhu, Tianyu Guo, Huanyao Zhang, Jingxuan Wei, Bihui Yu, Fan Yang, and Wentao Zhang. 2026. Training Verifier to Assessing Complex Real-World Tool-Use Trajectories. In Findings of the Association for Computational Linguistics: ACL 2026, pages 32922–32936, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Training Verifier to Assessing Complex Real-World Tool-Use Trajectories (Sun et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1647.pdf
Checklist:: 2026.findings-acl.1647.checklist.pdf

PDF Cite Search Checklist Fix data