Prototypical Reward Network for Data-Efficient Model Alignment
Jinghan Zhang, Xiting Wang, Yiqiao Jin, Changyu Chen, Xinhao Zhang, Kunpeng Liu
Abstract
The reward model for Reinforcement Learning from Human Feedback (RLHF) has proven effective in fine-tuning Large Language Models (LLMs). This paper explores enhancing RLHF with Prototypical Networks to improve reward models. We propose a framework utilizing Prototypical Networks to enhance reward models under limited human feedback, enabling more stable and reliable structural learning from fewer samples. This enhances the model’s adaptability and accuracy in interpreting human preferences. Our experiments demonstrate that this approach significantly improves the performance of reward models and LLMs in human feedback tasks, surpassing traditional methods, especially in data-limited scenarios.- Anthology ID:
- 2024.acl-long.748
- Volume:
- Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 13871–13884
- Language:
- URL:
- https://aclanthology.org/2024.acl-long.748
- DOI:
- Cite (ACL):
- Jinghan Zhang, Xiting Wang, Yiqiao Jin, Changyu Chen, Xinhao Zhang, and Kunpeng Liu. 2024. Prototypical Reward Network for Data-Efficient Model Alignment. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13871–13884, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Prototypical Reward Network for Data-Efficient Model Alignment (Zhang et al., ACL 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.acl-long.748.pdf