A Comprehensive Survey of Process Reward Models: Data Generation, Model Construction, and Usage

Congmin Zheng; Jiachen Zhu; Zhuoying Ou; Yuxiang Chen; Kangning Zhang; Rong Shan; Zeyu Zheng; Mengyue Yang; Jianghao Lin; Yong Yu; Weinan Zhang

A Comprehensive Survey of Process Reward Models: Data Generation, Model Construction, and Usage

Congmin Zheng, Jiachen Zhu, Zhuoying Ou, Yuxiang Chen, Kangning Zhang, Rong Shan, Zeyu Zheng, Mengyue Yang, Jianghao Lin, Yong Yu, Weinan Zhang

Abstract

Large Language Models (LLMs) have advanced reasoning ability, yet conventional alignment remains dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by evaluating and guiding reasoning at the step or trajectory level. This survey provides a systematic overview of PRMs through the full loop: how to generate process data, build PRMs, and use PRMs for test-time scaling and reinforcement learning. We summarize applications across math, code, text, multimodal reasoning, robotics, and agents, and review emerging benchmarks. Our goal is to clarify design spaces, reveal open challenges, and guide future research toward fine-grained, robust reasoning alignment.

Anthology ID:: 2026.acl-long.163
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3591–3607
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.163/
DOI:
Bibkey:
Cite (ACL):: Congmin Zheng, Jiachen Zhu, Zhuoying Ou, Yuxiang Chen, Kangning Zhang, Rong Shan, Zeyu Zheng, Mengyue Yang, Jianghao Lin, Yong Yu, and Weinan Zhang. 2026. A Comprehensive Survey of Process Reward Models: Data Generation, Model Construction, and Usage. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3591–3607, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: A Comprehensive Survey of Process Reward Models: Data Generation, Model Construction, and Usage (Zheng et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.163.pdf
Checklist:: 2026.acl-long.163.checklist.pdf

PDF Cite Search Checklist Fix data