Hop, skip, jump to Convergence: Dynamics of Learning Rate Transitions for Improved Training of Large Language Models

Shreyas Subramanian, Vignesh Ganapathiraman, Corey D Barrett


Abstract
Various types of learning rate (LR) schedulers are being used for training or fine tuning of Large Language Models today. In practice, several mid-flight changes are required in the LR schedule either manually, or with careful choices around warmup steps, peak LR, type of decay and restarts. To study this further, we consider the effect of switching the learning rate at a predetermined time during training, which we refer to as “SkipLR”. We model SGD as a stochastic gradient flow and show that when starting from the same initial parameters, switching the learning rate causes the loss curves to contract towards each other. We demonstrate this theoretically for some simple cases, and empirically on large language models. Our analysis provides insight into how learning rate schedules affect the training dynamics, and could inform the design of new schedules to accelerate convergence.
Anthology ID:
2024.findings-emnlp.954
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16349–16362
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.954/
DOI:
10.18653/v1/2024.findings-emnlp.954
Bibkey:
Cite (ACL):
Shreyas Subramanian, Vignesh Ganapathiraman, and Corey D Barrett. 2024. Hop, skip, jump to Convergence: Dynamics of Learning Rate Transitions for Improved Training of Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 16349–16362, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Hop, skip, jump to Convergence: Dynamics of Learning Rate Transitions for Improved Training of Large Language Models (Subramanian et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2024.findings-emnlp.954.pdf