LoRAExit: Empowering Dynamic Modulation of LLMs in Resource-limited Settings using Low-rank Adapters

Jiacheng Liu, Peng Tang, Xiaofeng Hou, Chao Li, Pheng-Ann Heng


Abstract
Large Language Models (LLMs) have exhibited remarkable performance across various natural language processing tasks. However, deploying LLMs on resource-limited settings remains a challenge. While early-exit techniques offer an effective approach, they often require compromised training methods that result in sub-optimal performance. On the other hand, multi-model methods achieve improved results but suffer from significant inference latency and memory consumption. In this paper, we propose LoRAExit, a novel dynamic inference architecture that leverages low-rank adaptors for efficient deployment of LLMs. LoRAExit decouples the training of multiple exit interfaces, enabling the separate optimization of each exit, thereby fundamentally addressing the performance issues of early-exit networks. Moreover, we introduce a superior-exit guided distillation method that effectively utilizes models of different sizes, thereby further enhancing the performance of early exits. Experimental results demonstrate that LoRAExit significantly improves LLM performance when deployed on resource-limited settings.
Anthology ID:
2024.findings-emnlp.539
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9211–9225
Language:
URL:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.539/
DOI:
10.18653/v1/2024.findings-emnlp.539
Bibkey:
Cite (ACL):
Jiacheng Liu, Peng Tang, Xiaofeng Hou, Chao Li, and Pheng-Ann Heng. 2024. LoRAExit: Empowering Dynamic Modulation of LLMs in Resource-limited Settings using Low-rank Adapters. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9211–9225, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
LoRAExit: Empowering Dynamic Modulation of LLMs in Resource-limited Settings using Low-rank Adapters (Liu et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-emnlp.539.pdf