ATQ: Activation Transformation forWeight-Activation Quantization of Large Language Models

Yundong Gai; Ping Li

doi:10.18653/v1/2024.findings-emnlp.1001

ATQ: Activation Transformation forWeight-Activation Quantization of Large Language Models

Abstract

There are many emerging quantization methods to resolve the problem that the huge demand on computational and storage costs hinders the deployment of Large language models (LLMs). However, their accuracy performance still can not satisfy the entire academic and industry community. In this work, we propose ATQ, an INT8 weight-activation quantization of LLMs, that can achieve almost lossless accuracy. We employ a mathematically equivalent transformation and a triangle inequality to constrain weight-activation quantization error to the sum of a weight quantization error and an activation quantization error. For the weight part, transformed weights are quantized along the |in-feature| dimension and the quantization error is compensated by optimizing following in-features. For the activation part, transformed activations are in the normal range and can be quantized easily. We provide comparison experiments to demonstrate that our ATQ method can achieve almost lossless in accuracy on OPT and LLaMA families in W8A8 quantization settings. The increase of perplexity is within 1 and the accuracy degradation is within 0.5 percent even in the worst case.

Anthology ID:: 2024.findings-emnlp.1001
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17187–17194
Language:
URL:: https://preview.aclanthology.org/add_missing_videos/2024.findings-emnlp.1001/
DOI:: 10.18653/v1/2024.findings-emnlp.1001
Bibkey:
Cite (ACL):: Yundong Gai and Ping Li. 2024. ATQ: Activation Transformation forWeight-Activation Quantization of Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 17187–17194, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: ATQ: Activation Transformation forWeight-Activation Quantization of Large Language Models (Gai & Li, Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add_missing_videos/2024.findings-emnlp.1001.pdf

PDF Search Fix data