Wenkai Fang
2025
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework
Jian Hu
|
Xibin Wu
|
Wei Shen
|
Jason Klein Liu
|
Weixun Wang
|
Songlin Jiang
|
Haoran Wang
|
Hao Chen
|
Bin Chen
|
Wenkai Fang
|
Xianyu
|
Yu Cao
|
Haotian Xu
|
Yiming Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values and further raise the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (long-CoT) tasks. However, existing RLHF (or RLVR) frameworks commonly face challenges such as inference bottlenecks and complexity barriers, restricting their accessibility for newcomers. To bridge this gap, we introduce OpenRLHF, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superior training efficiency with speedups ranging from 1.22× to 1.68× across different model sizes compared to state-of-the-art frameworks, while requiring significantly fewer lines of code for implementation. OpenRLHF is publicly available at https://github.com/OpenRLHF/OpenRLHF, and has already been adopted by leading institutions to accelerate RLHF research and learning.
Search
Fix author
Co-authors
- Yu Cao 1
- Hao Chen (陈昊) 1
- Bin Chen 1
- Jian Hu 1
- Songlin Jiang 1
- show all...