Qingyao Li
2025
CodePRM: Execution Feedback-enhanced Process Reward Model for Code Generation
Qingyao Li
|
Xinyi Dai
|
Xiangyang Li
|
Weinan Zhang
|
Yasheng Wang
|
Ruiming Tang
|
Yong Yu
Findings of the Association for Computational Linguistics: ACL 2025
Code generation is a critical reasoning task for large language models (LLMs). Recent advancements have focused on optimizing the thought process of code generation, achieving significant improvements. However, such thought process lacks effective process supervision, making it hard to optimize the thoughts. Although Process Reward Models (PRMs) have been widely established in mathematical reasoning, building a code PRM is still not trivial for the gap between thoughts to code. In this paper, we propose CodePRM, a novel approach that leverages the code execution feedback to build a code PRM. Specifically, we first collect a large dataset of thought traces, where each thought step is labeled with their derived code’ pass rates, accompanied by the corresponding code snippets, and execution feedback. During training, we train a PRM to take both the reasoning process and code execution feedback as input to score individual thought steps, enabling it to leverage code execution results to distinguish between high-quality and low-quality thought steps. Finally, to use the PRM during inference, we develop a Generate-Verify-Refine (GVR) pipeline where the CodePRM serves as a process verifier to dynamically identify and correct errors in the thought process during code search. Experimental results demonstrate that CodePRM with the inference algorithm outperforms strong baselines, significantly enhancing code generation performance. Further analysis reveals the key factors for building a code PRM.
Search
Fix author
Co-authors
- Xinyi Dai 1
- Xiangyang Li 1
- Ruiming Tang 1
- Yasheng Wang 1
- Yong Yu 1
- show all...