Wenkai Fang


2025

pdf bib
OpenRLHF: A Ray-based Easy-to-use, Scalable and High-performance RLHF Framework
Jian Hu | Xibin Wu | Wei Shen | Jason Klein Liu | Weixun Wang | Songlin Jiang | Haoran Wang | Hao Chen | Bin Chen | Wenkai Fang | Xianyu | Yu Cao | Haotian Xu | Yiming Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values and further raise the upper bound of AI capabilities, particularly in reasoning-intensive, long-context Chain-of-Thought (long-CoT) tasks. However, existing RLHF (or RLVR) frameworks commonly face challenges such as inference bottlenecks and complexity barriers, restricting their accessibility for newcomers. To bridge this gap, we introduce OpenRLHF, a user-friendly, scalable, and easy-to-learn open-source RLHF framework built upon Ray, vLLM, DeepSpeed, and HuggingFace Transformers, featuring a simplified design, clear code structure, and comprehensive documentation to facilitate entry for researchers and practitioners. Experimental results show that OpenRLHF achieves superior training efficiency with speedups ranging from 1.22× to 1.68× across different model sizes compared to state-of-the-art frameworks, while requiring significantly fewer lines of code for implementation. OpenRLHF is publicly available at https://github.com/OpenRLHF/OpenRLHF, and has already been adopted by leading institutions to accelerate RLHF research and learning.