PhysPRM: A Generative Process Reward Model with Fine-grained Diagnosis for Physics Problem Solving

Yuxuan Dong, Xinyu Zhang, Lingling Zhang, Han Lai, Pengyu Li, Bifan Wei, Yaqiang Wu, Jun Liu


Abstract
Despite the remarkable progress of Large Language Models (LLMs) in abstract reasoning tasks, they continue to struggle with physics problem solving due to difficulties in decoding implicit constraints and maintaining physical consistency. To address these challenges, Process Reward Models (PRMs) have emerged as a promising approach to verify intermediate reasoning steps. Existing PRMs attempt to mitigate reasoning errors but typically rely on scalar scoring, which lacks the explanatory power necessary to diagnose complex physical misconceptions. In this work, we introduce PhysPRM, a Generative PRM that treats evaluation as a generative task to produce fine-grained diagnoses comprising critiques, final judgments, and specific error types. To facilitate this, we develop an automated data synthesis pipeline to construct PhysPRM30K, a comprehensive training dataset, and PhysProcessBench, a rigorously human-verified benchmark. By employing a two-stage training paradigm that integrates Supervised Fine-Tuning with Group Relative Policy Optimization, PhysPRM significantly enhances the physics reasoning capabilities of various LLMs. Extensive experiments demonstrate that PhysPRM improves performance across seven benchmarks in both Best-of-N and critique refinement strategies.
Anthology ID:
2026.findings-acl.458
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9410–9427
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.458/
DOI:
Bibkey:
Cite (ACL):
Yuxuan Dong, Xinyu Zhang, Lingling Zhang, Han Lai, Pengyu Li, Bifan Wei, Yaqiang Wu, and Jun Liu. 2026. PhysPRM: A Generative Process Reward Model with Fine-grained Diagnosis for Physics Problem Solving. In Findings of the Association for Computational Linguistics: ACL 2026, pages 9410–9427, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
PhysPRM: A Generative Process Reward Model with Fine-grained Diagnosis for Physics Problem Solving (Dong et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.458.pdf
Checklist:
 2026.findings-acl.458.checklist.pdf