Training Verifier to Assessing Complex Real-World Tool-Use Trajectories
Linzhuang Sun, Mingyang Chen, Hao Liang, Tianpeng Li, Zhou Yijie, Chenzheng Zhu, Tianyu Guo, Huanyao Zhang, Jingxuan Wei, Bihui Yu, Fan Yang, Wentao Zhang
Abstract
Training effective AI agents for real-world tool-use interactions requires data that faithfully captures the dynamics of human–agent collaboration. However, such data is scarce, and existing methods often resort to synthetic data generation. The inherently dynamic and complex nature of user–agent interactions makes ensuring data quality particularly challenging. Current verification approaches are typically entangled with the synthesis process itself, resulting in complicated implementations that undermine both reproducibility and scalability. To address this, we introduce Tool-Verifier-7B, a plug-and-play framework for data quality control in tool-use scenarios. Building on this verifier and our data synthesis strategy, we construct the Tool-Verify dataset, which contains 3,295 curated samples. To directly assess verifier performance, we further release Tool-V-Bench, a benchmark of 165 human-validated trajectories spanning diverse interaction complexities. Comprehensive experiments show that Tool-Verifier-7B surpasses Qwen2.5-72B-Instruct on Tool-V-Bench. Moreover, the Tool-Verify dataset achieves superior performance compared to the previous APIGen-MT dataset.- Anthology ID:
- 2026.findings-acl.1647
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 32922–32936
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1647/
- DOI:
- Cite (ACL):
- Linzhuang Sun, Mingyang Chen, Hao Liang, Tianpeng Li, Zhou Yijie, Chenzheng Zhu, Tianyu Guo, Huanyao Zhang, Jingxuan Wei, Bihui Yu, Fan Yang, and Wentao Zhang. 2026. Training Verifier to Assessing Complex Real-World Tool-Use Trajectories. In Findings of the Association for Computational Linguistics: ACL 2026, pages 32922–32936, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Training Verifier to Assessing Complex Real-World Tool-Use Trajectories (Sun et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1647.pdf