Zero-Shot Dynamic Quantization for Transformer Inference

Yousef El-Kurdi; Jerry Quinn; Avirup Sil

Zero-Shot Dynamic Quantization for Transformer Inference

Abstract

We introduce a novel run-time method for significantly reducing the accuracy loss associated with quantizing BERT-like models to 8-bit integers. Existing methods for quantizing models either modify the training procedure, or they require an additional calibration step to adjust parameters that also requires a selected held-out dataset.Our method permits taking advantage of quantization without the need for these adjustments.We present results on several NLP tasks demonstrating the usefulness of this technique.

Anthology ID:: 2022.emnlp-industry.45
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: December
Year:: 2022
Address:: Abu Dhabi, UAE
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 451–457
Language:
URL:: https://aclanthology.org/2022.emnlp-industry.45
DOI:
Bibkey:
Cite (ACL):: Yousef El-kurdi, Jerry Quinn, and Avi Sil. 2022. Zero-Shot Dynamic Quantization for Transformer Inference. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 451–457, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Zero-Shot Dynamic Quantization for Transformer Inference (El-kurdi et al., EMNLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2022.emnlp-industry.45.pdf

PDF Search