PhysPRM: A Generative Process Reward Model with Fine-grained Diagnosis for Physics Problem Solving
Yuxuan Dong, Xinyu Zhang, Lingling Zhang, Han Lai, Pengyu Li, Bifan Wei, Yaqiang Wu, Jun Liu
Abstract
Despite the remarkable progress of Large Language Models (LLMs) in abstract reasoning tasks, they continue to struggle with physics problem solving due to difficulties in decoding implicit constraints and maintaining physical consistency. To address these challenges, Process Reward Models (PRMs) have emerged as a promising approach to verify intermediate reasoning steps. Existing PRMs attempt to mitigate reasoning errors but typically rely on scalar scoring, which lacks the explanatory power necessary to diagnose complex physical misconceptions. In this work, we introduce PhysPRM, a Generative PRM that treats evaluation as a generative task to produce fine-grained diagnoses comprising critiques, final judgments, and specific error types. To facilitate this, we develop an automated data synthesis pipeline to construct PhysPRM30K, a comprehensive training dataset, and PhysProcessBench, a rigorously human-verified benchmark. By employing a two-stage training paradigm that integrates Supervised Fine-Tuning with Group Relative Policy Optimization, PhysPRM significantly enhances the physics reasoning capabilities of various LLMs. Extensive experiments demonstrate that PhysPRM improves performance across seven benchmarks in both Best-of-N and critique refinement strategies.- Anthology ID:
- 2026.findings-acl.458
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9410–9427
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.458/
- DOI:
- Cite (ACL):
- Yuxuan Dong, Xinyu Zhang, Lingling Zhang, Han Lai, Pengyu Li, Bifan Wei, Yaqiang Wu, and Jun Liu. 2026. PhysPRM: A Generative Process Reward Model with Fine-grained Diagnosis for Physics Problem Solving. In Findings of the Association for Computational Linguistics: ACL 2026, pages 9410–9427, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- PhysPRM: A Generative Process Reward Model with Fine-grained Diagnosis for Physics Problem Solving (Dong et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.458.pdf