Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Lv Kaokao, Yi Liu


Abstract
Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks, but their deployment poses significant challenges due to substantial memory and storage requirements. Weight-only quantization has emerged as a promising solution to address these challenges. Previous research suggests that fine-tuning through up and down rounding can enhance performance. In this study, we introduce SignRound, a method that utilizes signed gradient descent (SignSGD) to optimize rounding values and weight clipping within just 200 steps. SignRound integrates the advantages of Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ), achieving exceptional results across 2 to 4 bits while maintaining low tuning costs and avoiding additional inference overhead. For example, SignRound achieves absolute average accuracy improvements ranging from 6.91% to 33.22% at 2 bits, as measured by the average zero-shot accuracy across 11 tasks. It also demonstrates strong generalization to recent models, achieving near-lossless 4-bit quantization in most scenarios. The source code will be made publicly available.
Anthology ID:
2024.findings-emnlp.662
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11332–11350
Language:
URL:
https://preview.aclanthology.org/add-emnlp-2024-awards/2024.findings-emnlp.662/
DOI:
10.18653/v1/2024.findings-emnlp.662
Bibkey:
Cite (ACL):
Wenhua Cheng, Weiwei Zhang, Haihao Shen, Yiyang Cai, Xin He, Lv Kaokao, and Yi Liu. 2024. Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 11332–11350, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs (Cheng et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/add-emnlp-2024-awards/2024.findings-emnlp.662.pdf
Software:
 2024.findings-emnlp.662.software.zip