Jinghan Zhang
2024
Prototypical Reward Network for Data-Efficient Model Alignment
Jinghan Zhang
|
Xiting Wang
|
Yiqiao Jin
|
Changyu Chen
|
Xinhao Zhang
|
Kunpeng Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven effective in fine-tuning Large Language Models (LLMs). This paper explores enhancing RLHF with Prototypical Networks to improve reward models. We propose a framework utilizing Prototypical Networks to enhance reward models under limited human feedback, enabling more stable and reliable structural learning from fewer samples. This enhances the model’s adaptability and accuracy in interpreting human preferences. Our experiments demonstrate that this approach significantly improves the performance of reward models and LLMs in human feedback tasks, surpassing traditional methods, especially in data-limited scenarios.