ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents

Dawei Li; Yuguang Yao; Zhen Tan; Huan Liu; Ruocheng Guo

ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents

Dawei Li, Yuguang Yao, Zhen Tan, Huan Liu, Ruocheng Guo

Abstract

Reward-guided search methods have demonstrated strong potential in enhancing tool-using agents by effectively guiding sampling and exploration over complex action spaces. As a core design, those search methods utilize process reward models (PRMs) to provide step-level rewards, enabling more fine-grained monitoring. However, there is a lack of systematic and reliable evaluation benchmarks for PRMs in tool-use settings. In this paper, we introduce ToolPRMBench, a large-scale benchmark specifically designed to evaluate PRMs for tool-using agents. ToolPRMBench is built on top of several representative tool-use benchmarks and converts agent trajectories into step-level test cases. Each case contains the interaction history, a correct action, a plausible but incorrect alternative, and relevant tool metadata. We respectively utilize offline sampling to isolate local single-step errors and online sampling to capture realistic multi-step failures from full agent rollouts. A multi-LLM verification pipeline is proposed to reduce label noise and ensure data quality. We conduct extensive experiments across large language models, general PRMs, and tool-specialized PRMs on ToolPRMBench. The results reveal clear differences in PRM effectiveness and highlight the potential of specialized PRMs for tool-using. Our code and dataset are available at: https://github.com/David-Li0406/ToolPRMBench[More resources on LLM-as-a-judge are on the website: <https://llm-as-a-judge.github.io>].

Anthology ID:: 2026.findings-acl.602
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12378–12391
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.602/
DOI:
Bibkey:
Cite (ACL):: Dawei Li, Yuguang Yao, Zhen Tan, Huan Liu, and Ruocheng Guo. 2026. ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents. In Findings of the Association for Computational Linguistics: ACL 2026, pages 12378–12391, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents (Li et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.602.pdf
Checklist:: 2026.findings-acl.602.checklist.pdf

PDF Cite Search Checklist Fix data