Zero-shot Sharpness-Aware Quantization for Pre-trained Language Models

Miaoxi Zhu; Qihuang Zhong; Li Shen; Liang Ding; Juhua Liu; Bo Du; Dacheng Tao

doi:10.18653/v1/2023.emnlp-main.696

Zero-shot Sharpness-Aware Quantization for Pre-trained Language Models

Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

Abstract

Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy concerns has emerged the demand for zero-shot quantization. Most of the cutting-edge zero-shot quantization methods primarily 1) apply to computer vision tasks, and 2) neglect of overfitting problem in the generative adversarial learning process, leading to sub-optimal performance. Motivated by this, we propose a novel zero-shot sharpness-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs. The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem. We theoretically prove the convergence rate for the minimax optimization problem and this result can be applied to other nonconvex-PL minimax optimization frameworks. Extensive experiments on 11 tasks demonstrate that our method brings consistent and significant performance gains on both discriminative and generative PLMs, i.e., up to +6.98 average score. Furthermore, we empirically validate that our method can effectively improve the model generalization.

Anthology ID:: 2023.emnlp-main.696
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 11305–11327
Language:
URL:: https://aclanthology.org/2023.emnlp-main.696
DOI:: 10.18653/v1/2023.emnlp-main.696
Bibkey:
Cite (ACL):: Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, and Dacheng Tao. 2023. Zero-shot Sharpness-Aware Quantization for Pre-trained Language Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11305–11327, Singapore. Association for Computational Linguistics.
Cite (Informal):: Zero-shot Sharpness-Aware Quantization for Pre-trained Language Models (Zhu et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-1/2023.emnlp-main.696.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-1/2023.emnlp-main.696.mp4

PDF Search Video