P-Check: Advancing Personalized Reward Model via Learning to Generate Dynamic Checklist

Kwangwook Seo; Dongha Lee

P-Check: Advancing Personalized Reward Model via Learning to Generate Dynamic Checklist

Abstract

Recent approaches in personalized reward modeling have primarily focused on leveraging user interaction history to align model judgments with individual preferences. However, existing approaches largely treat user context as a static or implicit conditioning signal, failing to capture the dynamic and multi-faceted nature of human judgment. In this paper, we propose P-Check, a novel personalized reward modeling framework, designed to train a plug-and-play checklist generator that synthesizes dynamic evaluation criteria for guiding the reward prediction. To better align these checklists with personalized nuances, we introduce Preference-Contrastive Criterion Weighting, a training strategy that assigns saliency scores to criteria based on their discriminative power for personalized judgment. We conduct extensive experiments and demonstrate that P-Check not only improves reward accuracy but also enhances downstream personalized generation, and remains robust in OOD scenarios.

Anthology ID:: 2026.acl-long.2011
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 43447–43471
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.2011/
DOI:
Bibkey:
Cite (ACL):: Kwangwook Seo and Dongha Lee. 2026. P-Check: Advancing Personalized Reward Model via Learning to Generate Dynamic Checklist. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 43447–43471, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: P-Check: Advancing Personalized Reward Model via Learning to Generate Dynamic Checklist (Seo & Lee, ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.2011.pdf
Checklist:: 2026.acl-long.2011.checklist.pdf

PDF Cite Search Checklist Fix data