Han Lai
2026
PhysPRM: A Generative Process Reward Model with Fine-grained Diagnosis for Physics Problem Solving
Yuxuan Dong | Xinyu Zhang | Lingling Zhang | Han Lai | Pengyu Li | Bifan Wei | Yaqiang Wu | Jun Liu
Findings of the Association for Computational Linguistics: ACL 2026
Yuxuan Dong | Xinyu Zhang | Lingling Zhang | Han Lai | Pengyu Li | Bifan Wei | Yaqiang Wu | Jun Liu
Findings of the Association for Computational Linguistics: ACL 2026
Despite the remarkable progress of Large Language Models (LLMs) in abstract reasoning tasks, they continue to struggle with physics problem solving due to difficulties in decoding implicit constraints and maintaining physical consistency. To address these challenges, Process Reward Models (PRMs) have emerged as a promising approach to verify intermediate reasoning steps. Existing PRMs attempt to mitigate reasoning errors but typically rely on scalar scoring, which lacks the explanatory power necessary to diagnose complex physical misconceptions. In this work, we introduce PhysPRM, a Generative PRM that treats evaluation as a generative task to produce fine-grained diagnoses comprising critiques, final judgments, and specific error types. To facilitate this, we develop an automated data synthesis pipeline to construct PhysPRM30K, a comprehensive training dataset, and PhysProcessBench, a rigorously human-verified benchmark. By employing a two-stage training paradigm that integrates Supervised Fine-Tuning with Group Relative Policy Optimization, PhysPRM significantly enhances the physics reasoning capabilities of various LLMs. Extensive experiments demonstrate that PhysPRM improves performance across seven benchmarks in both Best-of-N and critique refinement strategies.