Chao Xin


2025

pdf bib
Lost in the Context: Insufficient and Distracted Attention to Contexts in Preference Modeling
Shihan Dou | Jiayi Chen | Chenhao Huang | Feng Chen | Wei Chengzhi | Huiyuan Zheng | Shichun Liu | Yan Liu | Chenxiao Liu | Chao Xin | Lin Yan | Zongzhang Zhang | Tao Gui | Qi Zhang | Xuanjing Huang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In Reinforcement Learning from Human Feedback (RLHF), the reward model (RM) evaluates the response quality based on the given context and assigns a reward. It plays a crucial role in aligning RLHF with human preferences. Although the current RM training paradigm concatenates the context and response while amplifying the reward difference between good and bad response pairs, we demonstrate that the RM faces two significant issues: i) it often allocates only a small proportion of attention to the context, and ii) it frequently ignores segments of the context that are relevant for evaluating the response quality. These issues undermine the RM’s effectiveness in modeling human preferences. To further address these challenges, we propose AttnRM, a novel optimization framework that enables the RM to concentrate on crucial segments of the context. Experimental results demonstrate that AttnRM significantly improves preference modeling by increasing attention to relevant information within the context. It also enhances the RM’s generalizability and achieves better performance in aligning with human preferences.