Better Embeddings with Coupled Adam

Felix Stollenwerk, Tobias Stollenwerk


Abstract
Despite their remarkable capabilities, LLMs learn word representations that exhibit the undesirable yet poorly understood feature of anisotropy. In this paper, we argue that the second moment in Adam is a cause of anisotropic embeddings, and suggest a modified optimizer called Coupled Adam to mitigate the problem. Our experiments demonstrate that Coupled Adam significantly improves the quality of embeddings, while also leading to better upstream and downstream performance on large enough datasets.
Anthology ID:
2025.acl-long.1321
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
27219–27236
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1321/
DOI:
Bibkey:
Cite (ACL):
Felix Stollenwerk and Tobias Stollenwerk. 2025. Better Embeddings with Coupled Adam. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 27219–27236, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Better Embeddings with Coupled Adam (Stollenwerk & Stollenwerk, ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1321.pdf