Entropy Scheduling in Reinforcement Learning for Large Language Models

Xingjin Wang; Hao Sun; Lu Wang; Linjing Li; Daniel Dajun Zeng

Entropy Scheduling in Reinforcement Learning for Large Language Models

Xingjin Wang, Howe Tissue, Lu Wang, Linjing Li, Daniel Dajun Zeng

Abstract

We observe that entropy in reinforcement learning functions analogously to the learning rate in LLMs. Maintaining stable entropy, as demonstrated in DAPO, helps stabilize RL training, while rapid entropy annealing (i.e., so-called entropy collapse) accelerates local performance improvement and enables faster convergence. We argue that these two processes are not antithetical, but can be effectively controlled and scheduled within a single training run, similar to learning rate scheduling. We propose Entropy Schduling (ES), which optimizes different pre-set goals (e.g. k in optimizing Pass@k) by controlling and scheduling entropy at each step of the RL process. We find that maintaining stable entropy early in training followed by entropy annealing achieves superior performance. Moreover, since stable-state entropy and annealed entropy exhibit distinctly different learning dynamics, curriculum learning can be seamlessly integrated to maximize model performance based on different entropy phases. We show that entropy scheduling is straightforward to implement and intuitive in design. Extensive experiments suggest that it delivers consistent and stable performance improvements across diverse models and algorithms.

Anthology ID:: 2026.findings-acl.206
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4239–4251
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.206/
DOI:
Bibkey:
Cite (ACL):: Xingjin Wang, Howe Tissue, Lu Wang, Linjing Li, and Daniel Dajun Zeng. 2026. Entropy Scheduling in Reinforcement Learning for Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 4239–4251, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Entropy Scheduling in Reinforcement Learning for Large Language Models (Wang et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.206.pdf
Checklist:: 2026.findings-acl.206.checklist.pdf

PDF Cite Search Checklist Fix data