RA-LoRA: Rank-Adaptive Parameter-Efficient Fine-Tuning for Accurate 2-bit Quantized Large Language Models

Minsoo Kim; Sihwa Lee; Wonyong Sung; Jungwook Choi

doi:10.18653/v1/2024.findings-acl.933

RA-LoRA: Rank-Adaptive Parameter-Efficient Fine-Tuning for Accurate 2-bit Quantized Large Language Models

Minsoo Kim, Sihwa Lee, Wonyong Sung, Jungwook Choi

Abstract

Deploying large language models (LLMs) with their extensive parameters and high memory demands challenges computational efficiency, particularly in fine-tuning for specific applications with limited resources. Techniques like Low-Rank Adaptation (LoRA) help by training a smaller, modifiable extension of the base model to reduce memory usage. However, combining quantization with LoRA, especially in low-bit scenarios, can lead to performance losses due to quantization errors. Our innovative Rank-Adaptive LoRA (RA-LoRA) addresses this by dynamically adjusting the adapter’s rank using rank-subspace analysis, optimizing performance with fewer parameters. We tested RA-LoRA on state-of-the-art LLMs for 2-bit efficient fine-tuning, showing it can improve model accuracy with minimal trainable parameters, marking a leap forward in quantization-aware fine-tuning methods and highlighting the significance of rank dynamics in optimizing quantized LLMs.

Anthology ID:: 2024.findings-acl.933
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15773–15786
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-acl.933/
DOI:: 10.18653/v1/2024.findings-acl.933
Bibkey:
Cite (ACL):: Minsoo Kim, Sihwa Lee, Wonyong Sung, and Jungwook Choi. 2024. RA-LoRA: Rank-Adaptive Parameter-Efficient Fine-Tuning for Accurate 2-bit Quantized Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2024, pages 15773–15786, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: RA-LoRA: Rank-Adaptive Parameter-Efficient Fine-Tuning for Accurate 2-bit Quantized Large Language Models (Kim et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-acl.933.pdf

PDF Cite Search Fix data