SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics

Arash Ardakani; Altan Haan; Shangyin Tan; Doru Thom Popovici; Alvin Cheung; Costin Iancu; Koushik Sen

SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics

Arash Ardakani, Altan Haan, Shangyin Tan, Doru Thom Popovici, Alvin Cheung, Costin Iancu, Koushik Sen

Abstract

Transformer-based models, such as BERT and ViT, have achieved state-of-the-art results across different natural language processing (NLP) and computer vision (CV) tasks. However, these models are extremely memory intensive during their fine-tuning process, making them difficult to deploy on GPUs with limited memory resources. To address this issue, we introduce a new tool called SlimFit that reduces the memory requirements of these models by dynamically analyzing their training dynamics and freezing less-contributory layers during fine-tuning. The layers to freeze are chosen using a runtime inter-layer scheduling algorithm. This allows SlimFit to freeze up to 95% of layers and reduce the overall on-device GPU memory usage of transformer-based models such as ViT and BERT by an average of 2.2x, across different NLP and CV benchmarks/datasets such as GLUE, SQuAD 2.0, CIFAR-10, CIFAR-100 and ImageNet with an average degradation of 0.2% in accuracy. For such NLP and CV tasks, SlimFit can reduce up to 3.1x the total on-device memory usage with an accuracy degradation of only up to 0.4%. As a result, while fine-tuning of ViT on ImageNet and BERT on SQuAD 2.0 with a batch size of 128 requires 3 and 2 32GB GPUs, respectively, SlimFit enables fine-tuning them on a single 32GB GPU without any significant accuracy degradation. The code of SlimFit is available at https://github.com/arashardakani/SlimFit.

Anthology ID:: 2024.naacl-long.345
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Kevin Duh, Helena Gomez, Steven Bethard
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6218–6236
Language:
URL:: https://aclanthology.org/2024.naacl-long.345
DOI:
Bibkey:
Cite (ACL):: Arash Ardakani, Altan Haan, Shangyin Tan, Doru Thom Popovici, Alvin Cheung, Costin Iancu, and Koushik Sen. 2024. SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6218–6236, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics (Ardakani et al., NAACL 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.naacl-long.345.pdf