Abstract
There are many emerging quantization methods to resolve the problem that the huge demand on computational and storage costs hinders the deployment of Large language models (LLMs). However, their accuracy performance still can not satisfy the entire academic and industry community. In this work, we propose ATQ, an INT8 weight-activation quantization of LLMs, that can achieve almost lossless accuracy. We employ a mathematically equivalent transformation and a triangle inequality to constrain weight-activation quantization error to the sum of a weight quantization error and an activation quantization error. For the weight part, transformed weights are quantized along the |in-feature| dimension and the quantization error is compensated by optimizing following in-features. For the activation part, transformed activations are in the normal range and can be quantized easily. We provide comparison experiments to demonstrate that our ATQ method can achieve almost lossless in accuracy on OPT and LLaMA families in W8A8 quantization settings. The increase of perplexity is within 1 and the accuracy degradation is within 0.5 percent even in the worst case.- Anthology ID:
- 2024.findings-emnlp.1001
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2024
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 17187–17194
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.findings-emnlp.1001/
- DOI:
- 10.18653/v1/2024.findings-emnlp.1001
- Cite (ACL):
- Yundong Gai and Ping Li. 2024. ATQ: Activation Transformation forWeight-Activation Quantization of Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 17187–17194, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- ATQ: Activation Transformation forWeight-Activation Quantization of Large Language Models (Gai & Li, Findings 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.findings-emnlp.1001.pdf