Scale Down to Speed Up: Dynamic Data Selection for Reinforcement Learning

Zhuoyue Chen, Jihai Zhang, Ben Liu, Fangquan Lin, Wotao Yin


Abstract
Optimizing data utilization remains a central challenge in applying Reinforcement Learning (RL) to Large Language Models (LLMs), directly impacting sample efficiency, training stability, and final model performance.Current approaches often rely on massive static datasets, leading to computational inefficiency and redundant gradient updates.In this paper, we propose ScalingRL, a data-centric RL framework that dynamically selects the most informative training samples to optimize RL for mathematical reasoning.Specifically, ScalingRL introduces the Data Effectiveness Score (DES) that quantitatively ranks prompts according to three complementary factors: problem difficulty, Chain-of-Thought complexity, and reward adaptability.Then, ScalingRL employs an adaptive curriculum scheduler that progressively adjusts the overall scale and specific mix of training prompts—balancing exploration of new, challenging data with exploitation of previously learned concepts—thereby tailoring the data distribution to the model’s current learning trajectory and performance.Experimental results demonstrate that ScalingRL achieves comparable performance to full-data training methods while requiring only 1.5K samples instead of 220K, reducing training time from 13 days to just 4 hours on A800 GPUs.
Anthology ID:
2025.findings-emnlp.412
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7806–7817
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.412/
DOI:
10.18653/v1/2025.findings-emnlp.412
Bibkey:
Cite (ACL):
Zhuoyue Chen, Jihai Zhang, Ben Liu, Fangquan Lin, and Wotao Yin. 2025. Scale Down to Speed Up: Dynamic Data Selection for Reinforcement Learning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 7806–7817, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Scale Down to Speed Up: Dynamic Data Selection for Reinforcement Learning (Chen et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.412.pdf
Checklist:
 2025.findings-emnlp.412.checklist.pdf