Reinforcement Learning with Supervised Alignment

João Luís Lins, Jia Xu


Abstract
Supervised fine-tuning (SFT) is a widely used and highly effective method for adapting Large Language Models (LLMs) to specific tasks. However, it often suffers from overfitting, causing models to excel on fine-tuned data but struggle with unseen or rare real-world inputs. While recent methods like Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with AI Feedback (RLAIF) aim to align LLMs with human values and tasks, they face challenges such as the high cost of human labeling or instabilities and biases inherent in using LLMs as judges. To address these issues, we propose a novel approach called Reinforcement Learning from supervised Alignment (RLA), which constructs a supervised alignment to train the reward model for reinforcement learning. Using only 100,000 MS MARCO samples, our method outperforms RLAIF by a relative margin ranging from +5.38% to +131.8%. It also significantly enhances the baseline Llama3 LLM, achieving up to +55% improvement on in-domain tasks and up to +16% on out-of-domain tasks. While RLA slightly underperforms supervised fine-tuning (SFT) on in-domain benchmarks, it surpasses SFT by up to 50 times on out-of-domain and cross-task evaluations, demonstrating strong generalization capabilities.
Anthology ID:
2025.findings-emnlp.378
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7165–7181
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.378/
DOI:
10.18653/v1/2025.findings-emnlp.378
Bibkey:
Cite (ACL):
João Luís Lins and Jia Xu. 2025. Reinforcement Learning with Supervised Alignment. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 7165–7181, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Reinforcement Learning with Supervised Alignment (Lins & Xu, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.378.pdf
Checklist:
 2025.findings-emnlp.378.checklist.pdf