Vocabulary-level Memory Efficiency for Language Model Fine-tuning

Miles Williams, Nikolaos Aletras


Abstract
The extensive memory footprint of language model (LM) fine-tuning poses a challenge for both researchers and practitioners. LMs use an embedding matrix to represent extensive vocabularies, forming a substantial proportion of the model parameters. While previous work towards memory-efficient fine-tuning has focused on minimizing the number of trainable parameters, reducing the memory footprint of the embedding matrix has yet to be explored. We first demonstrate that a significant proportion of the vocabulary remains unused during fine-tuning. We then propose a simple yet effective approach that leverages this finding to minimize memory usage. We show that our approach provides substantial reductions in memory usage across a wide range of models and tasks. Notably, our approach does not impact downstream task performance, while allowing more efficient use of computational resources.
Anthology ID:
2025.repl4nlp-1.14
Volume:
Proceedings of the 10th Workshop on Representation Learning for NLP (RepL4NLP-2025)
Month:
May
Year:
2025
Address:
Albuquerque, NM
Editors:
Vaibhav Adlakha, Alexandra Chronopoulou, Xiang Lorraine Li, Bodhisattwa Prasad Majumder, Freda Shi, Giorgos Vernikos
Venues:
RepL4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
185–196
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.repl4nlp-1.14/
DOI:
Bibkey:
Cite (ACL):
Miles Williams and Nikolaos Aletras. 2025. Vocabulary-level Memory Efficiency for Language Model Fine-tuning. In Proceedings of the 10th Workshop on Representation Learning for NLP (RepL4NLP-2025), pages 185–196, Albuquerque, NM. Association for Computational Linguistics.
Cite (Informal):
Vocabulary-level Memory Efficiency for Language Model Fine-tuning (Williams & Aletras, RepL4NLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.repl4nlp-1.14.pdf