PRISM: Probabilistic Reward Model with Inherent Structural Modeling

Yuhang Zhou (周宇航); Yixin Cao; Yuchen Ni; Shihan Dou; Xutian Chen; Ge Zhang; Xiang Liu; Guangnan Ye (叶广楠)

PRISM: Probabilistic Reward Model with Inherent Structural Modeling

Yuhang Zhou, Yixin Cao, Yuchen Ni, Shihan Dou, Xutian Chen, Ge Zhang, Xiang Liu, Guangnan Ye

Abstract

Standard evaluators, such as reward models, compress diverse human judgments into a single scalar, conflating valid Subjective Preference with Cognitive Uncertainty. This structural mismatch often leads to brittle alignment and reward hacking. To address this, we propose PRISM which reinterprets reward evaluation as a conditional distribution parameterized by a Mixture of Gaussians. PRISM structurally disentangles these factors: distinct Gaussian experts emerge to capture conflicting preference dimensions, while their variance estimates quantify uncertainty, acting as a dynamic reliability gate during optimization. We introduce a two-stage training strategy to learn these disentangled representations from scalable pairwise comparisons without requiring massive fine-grained annotations. Empirical results show that PRISM significantly outperforms scalar baselines in both accuracy and generalization. Furthermore, in downstream Reinforcement Learning, PRISM effectively mitigates reward hacking, yielding policies that are more robust and resilient to distribution shifts.

Anthology ID:: 2026.acl-long.563
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12345–12362
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.563/
DOI:
Bibkey:
Cite (ACL):: Yuhang Zhou, Yixin Cao, Yuchen Ni, Shihan Dou, Xutian Chen, Ge Zhang, Xiang Liu, and Guangnan Ye. 2026. PRISM: Probabilistic Reward Model with Inherent Structural Modeling. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12345–12362, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: PRISM: Probabilistic Reward Model with Inherent Structural Modeling (Zhou et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.563.pdf
Checklist:: 2026.acl-long.563.checklist.pdf

PDF Cite Search Checklist Fix data