Mingyang Song

Other people with similar names: Mingyang Song

Unverified author pages with similar names: Mingyang Song

2025

pdf bib abs
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models
Mingyang Song | Zhaochen Su | Xiaoye Qu | Jiawei Zhou | Yu Cheng
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Process-level Reward Models (PRMs) are crucial for complex reasoning and decision-making tasks, where each intermediate step plays an important role in the reasoning process. Since language models are prone to various types of errors during the reasoning process, PRMs are required to possess nuanced capabilities for detecting various implicit error types in real-world scenarios. However, current benchmarks primarily focus on step correctness, failing to evaluate PRMs’ performance systematically. To address this gap, we introduce PRMBench, a process-level benchmark specifically designed to assess the fine-grained error detection capabilities of PRMs. PRMBench comprises 6,216 carefully designed problems and 83,456 step-level labels, evaluating models across multiple dimensions, including simplicity, soundness, and sensitivity. In our experiments on 25 models, spanning both open-source PRMs and closed-source large language models prompted as critic models, we uncover significant weaknesses in current PRMs. These findings underscore the challenges inherent in process-level evaluation and highlight key directions for future research, establishing PRMBench as a robust testbed for advancing research on PRM evaluation and development.

Co-authors

Venues

acl1

Fix data

Mingyang Song

Fixing paper assignments

2025

Co-authors

Venues