SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization

Wooin Lee, Hyun-Tae Kim


Abstract
The AdamW optimizer, while standard for LLM pretraining, is a critical memory bottleneck, consuming optimizer states equivalent to twice the model’s size. Although light-state optimizers like SinkGD attempt to address this issue, we identify the embedding layer dilemma: these methods fail to handle the sparse, high-variance gradients inherent to embeddings, forcing a hybrid design that reverts to AdamW and partially negates the memory gains. We propose SAGE (Sign Adaptive GradiEnt), a novel optimizer that resolves this dilemma by replacing AdamW in this hybrid structure. SAGE combines a Lion-style update direction with a new, memory-efficient O(d) adaptive scale. This scale acts as a "safe damper," provably bounded by 1.0, which tames high-variance dimensions more effectively than existing methods. This superior stability allows SAGE to achieve better convergence. On Llama models up to 1.3B parameters, our SAGE-based hybrid achieves new state-of-the-art perplexity, outperforming all baselines, including SinkGD hybrid, while significantly reducing optimizer state memory.
Anthology ID:
2026.findings-acl.923
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18525–18537
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.923/
DOI:
Bibkey:
Cite (ACL):
Wooin Lee and Hyun-Tae Kim. 2026. SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization. In Findings of the Association for Computational Linguistics: ACL 2026, pages 18525–18537, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
SAGE: Sign-Adaptive Gradient for Memory-Efficient LLM Optimization (Lee & Kim, Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.923.pdf
Checklist:
 2026.findings-acl.923.checklist.pdf