ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents

Dawei Li, Yuguang Yao, Zhen Tan, Huan Liu, Ruocheng Guo


Abstract
Reward-guided search methods have demonstrated strong potential in enhancing tool-using agents by effectively guiding sampling and exploration over complex action spaces. As a core design, those search methods utilize process reward models (PRMs) to provide step-level rewards, enabling more fine-grained monitoring. However, there is a lack of systematic and reliable evaluation benchmarks for PRMs in tool-use settings. In this paper, we introduce ToolPRMBench, a large-scale benchmark specifically designed to evaluate PRMs for tool-using agents. ToolPRMBench is built on top of several representative tool-use benchmarks and converts agent trajectories into step-level test cases. Each case contains the interaction history, a correct action, a plausible but incorrect alternative, and relevant tool metadata. We respectively utilize offline sampling to isolate local single-step errors and online sampling to capture realistic multi-step failures from full agent rollouts. A multi-LLM verification pipeline is proposed to reduce label noise and ensure data quality. We conduct extensive experiments across large language models, general PRMs, and tool-specialized PRMs on ToolPRMBench. The results reveal clear differences in PRM effectiveness and highlight the potential of specialized PRMs for tool-using. Our code and dataset are available at: https://github.com/David-Li0406/ToolPRMBench[More resources on LLM-as-a-judge are on the website: <https://llm-as-a-judge.github.io>].
Anthology ID:
2026.findings-acl.602
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12378–12391
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.602/
DOI:
Bibkey:
Cite (ACL):
Dawei Li, Yuguang Yao, Zhen Tan, Huan Liu, and Ruocheng Guo. 2026. ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 12378–12391, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents (Li et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.602.pdf
Checklist:
 2026.findings-acl.602.checklist.pdf