Out of Distribution, Out of Luck: Process Rewards Misguide Reasoning Models

Alexey Dontsov; Anton Korznikov; Andrey V. Galichin; Elena Tutubalina

Out of Distribution, Out of Luck: Process Rewards Misguide Reasoning Models

Alexey Dontsov, Anton Korznikov, Andrey V. Galichin, Elena Tutubalina

Abstract

Process Reward Models (PRMs) have emerged as a promising approach for guiding large language models (LLMs) through multi-step reasoning by providing step-level feedback during inference. However, our evaluation across 7 LLMs reveals a failure mode: while PRMs improve performance for instruct mathematical models, they fail to enhance and sometimes degrade reasoning model performance. Through systematic analysis with linear probes, we identify distinct reward prediction patterns that differentiate reasoning from non-reasoning model outputs. To understand this mechanism, we train Sparse Autoencoders on the Qwen2.5-Math-PRM and analyze reasoning features. Our analysis reveals that 80% of these features respond to formatting artifacts (whitespace patterns, Unicode tokens, punctuation) rather than mathematical content. Reasoning model outputs exhibit distinct metacognitive patterns absent from standard mathematical solutions. This explains why they lead to unreliable reward estimation. Our findings expose a fundamental limitation in applying existing reward models to reasoning systems and provide mechanistic insights into this failure mode. We release our trained SAEs to facilitate future research into reward model interpretability.

Anthology ID:: 2026.eacl-short.31
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 421–435
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.31/
DOI:
Bibkey:
Cite (ACL):: Alexey Dontsov, Anton Korznikov, Andrey V. Galichin, and Elena Tutubalina. 2026. Out of Distribution, Out of Luck: Process Rewards Misguide Reasoning Models. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 421–435, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Out of Distribution, Out of Luck: Process Rewards Misguide Reasoning Models (Dontsov et al., EACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-short.31.pdf
Checklist:: 2026.eacl-short.31.checklist.pdf

PDF Cite Search Checklist Fix data