Efficient Hyperparameter Optimization for LLM Reinforcement Learning

Minping Chen; Bowen Xiao; Du Liang; Chuxuan Zeng; Zeyi Wen

Efficient Hyperparameter Optimization for LLM Reinforcement Learning

Minping Chen, Bowen Xiao, Du Liang, Chuxuan Zeng, Zeyi Wen

Abstract

Hyperparameters are critical to LLM reinforcement learning (RL), but existing hyperparameter optimization (HPO) methods remain inefficient in this area, due to the massive model scale and resource-intensive training cycles. In this paper, we propose Joint Fidelity Hyperparameter Optimization (JF-HPO), which simultaneously adapts both model size and training budget as fidelity. JF-HPO is empowered by: (i) a small proxy model of the target LLM for efficient training and evaluation in each HPO trial; (ii) several carefully designed early-stopping strategies based on training dynamics; (iii) an efficient checkpointing mechanism to eliminate redundant computations. JF-HPO significantly improves the computational efficiency of each trial (up to 14.9×) compared with existing HPO methods, thus achieving better predictive accuracy in most cases under the same time budget. Notably, JF-HPO delivers performance improvements ranging from 5.8% to 111.6% over VeRL Recipe.

Anthology ID:: 2026.acl-long.1271
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27540–27552
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1271/
DOI:
Bibkey:
Cite (ACL):: Minping Chen, Bowen Xiao, Du Liang, Chuxuan Zeng, and Zeyi Wen. 2026. Efficient Hyperparameter Optimization for LLM Reinforcement Learning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 27540–27552, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Efficient Hyperparameter Optimization for LLM Reinforcement Learning (Chen et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1271.pdf
Checklist:: 2026.acl-long.1271.checklist.pdf

PDF Cite Search Checklist Fix data