Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback

Jiaye Lin, Mengdi Li, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, Di Wang


Abstract
Reward models trained through Reinforcement Learning from AI Feedback (RLAIF) methods frequently suffer from limited generalizability, which hinders the alignment performance of policy models. This challenge stems from various issues, including distribution shift, preference label noise, and mismatch of overly challenging samples with model capacity. In this paper, we aim to enhance the generalizability of reward models through a data-centric approach, driven by the insight that these issues are inherently intertwined from a uniform perspective of data difficulty. Accordingly, we propose a novel framework, Curriculum-RLAIF, which constructs preference pairs with varying difficulty levels and then produces a specific curriculum for reward model training. Comprehensive experimental results suggest that reward models trained with Curriculum-RLAIF achieve improved generalizability, boosting the alignment performance of policy models by a significant margin without incurring additional inference costs compared to various existing non-curriculum baselines. Further analysis and comparison with alternative strategies highlight the superiority of Curriculum-RLAIF in simplicity, efficiency, and effectiveness.
Anthology ID:
2026.findings-acl.1685
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33757–33777
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1685/
DOI:
Bibkey:
Cite (ACL):
Jiaye Lin, Mengdi Li, Xufeng Zhao, Wenhao Lu, Peilin Zhao, Stefan Wermter, and Di Wang. 2026. Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback. In Findings of the Association for Computational Linguistics: ACL 2026, pages 33757–33777, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Curriculum-RLAIF: Curriculum Alignment with Reinforcement Learning from AI Feedback (Lin et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1685.pdf
Checklist:
 2026.findings-acl.1685.checklist.pdf