RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

Xijie Huang; Zechun Liu; Shih-Yang Liu; Kwang-Ting Cheng

doi:10.18653/v1/2024.findings-emnlp.444

RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng

Abstract

Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT) method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantization to the LoRA pipeline is under-explored, and we observe substantial performance degradation primarily due to the presence of activation outliers. In this work, we propose RoLoRA, the first LoRA-based scheme to apply rotation for outlier elimination, and then fine-tune rotated outlier-free LLMs for effective weight-activation quantization. Different from previous work tackling the outlier challenges from a post-training perspective, we propose rotation-aware fine-tuning to eliminate and preserve the outlier-free characteristics brought by rotation operations. RoLoRA can improve low-bit LoRA convergence and post-training quantization robustness in weight-activation settings. RoLoRA is evaluated across various LLM series (LLaMA2, LLaMA3, LLaVA-1.5), tasks, and quantization settings, achieving up to 29.5% absolute accuracy gain of 4-bit weight-activation quantized LLaMA2-13B on commonsense reasoning tasks compared to LoRA baseline. We further demonstrate its effectiveness on Large Multimodal Models (LMMs) and prove the compatibility with advanced LoRA variants.

Anthology ID:: 2024.findings-emnlp.444
Original:: 2024.findings-emnlp.444v1
Version 2:: 2024.findings-emnlp.444v2
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 7563–7576
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.findings-emnlp.444/
DOI:: 10.18653/v1/2024.findings-emnlp.444
Bibkey:
Cite (ACL):: Xijie Huang, Zechun Liu, Shih-Yang Liu, and Kwang-Ting Cheng. 2024. RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 7563–7576, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization (Huang et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.findings-emnlp.444.pdf

PDF (v2) PDF (v1) Cite Search Fix data