Out of Distribution, Out of Luck: Process Rewards Misguide Reasoning Models
Alexey Dontsov, Anton Korznikov, Andrey V. Galichin, Elena Tutubalina
Abstract
Process Reward Models (PRMs) have emerged as a promising approach for guiding large language models (LLMs) through multi-step reasoning by providing step-level feedback during inference. However, our evaluation across 7 LLMs reveals a failure mode: while PRMs improve performance for instruct mathematical models, they fail to enhance and sometimes degrade reasoning model performance. Through systematic analysis with linear probes, we identify distinct reward prediction patterns that differentiate reasoning from non-reasoning model outputs. To understand this mechanism, we train Sparse Autoencoders on the Qwen2.5-Math-PRM and analyze reasoning features. Our analysis reveals that 80% of these features respond to formatting artifacts (whitespace patterns, Unicode tokens, punctuation) rather than mathematical content. Reasoning model outputs exhibit distinct metacognitive patterns absent from standard mathematical solutions. This explains why they lead to unreliable reward estimation. Our findings expose a fundamental limitation in applying existing reward models to reasoning systems and provide mechanistic insights into this failure mode. We release our trained SAEs to facilitate future research into reward model interpretability.- Anthology ID:
- 2026.eacl-short.31
- Volume:
- Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Vera Demberg, Kentaro Inui, Lluís Marquez
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 421–435
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.31/
- DOI:
- Cite (ACL):
- Alexey Dontsov, Anton Korznikov, Andrey V. Galichin, and Elena Tutubalina. 2026. Out of Distribution, Out of Luck: Process Rewards Misguide Reasoning Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 421–435, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Out of Distribution, Out of Luck: Process Rewards Misguide Reasoning Models (Dontsov et al., EACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.31.pdf