Compressing Pre-trained Language Models by Matrix Decomposition

Matan Ben Noach, Yoav Goldberg


Abstract
Large pre-trained language models reach state-of-the-art results on many different NLP tasks when fine-tuned individually; They also come with a significant memory and computational requirements, calling for methods to reduce model sizes (green AI). We propose a two-stage model-compression method to reduce a model’s inference time cost. We first decompose the matrices in the model into smaller matrices and then perform feature distillation on the internal representation to recover from the decomposition. This approach has the benefit of reducing the number of parameters while preserving much of the information within the model. We experimented on BERT-base model with the GLUE benchmark dataset and show that we can reduce the number of parameters by a factor of 0.4x, and increase inference speed by a factor of 1.45x, while maintaining a minimal loss in metric performance.
Anthology ID:
2020.aacl-main.88
Volume:
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
Month:
December
Year:
2020
Address:
Suzhou, China
Venue:
AACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
884–889
Language:
URL:
https://aclanthology.org/2020.aacl-main.88
DOI:
Bibkey:
Cite (ACL):
Matan Ben Noach and Yoav Goldberg. 2020. Compressing Pre-trained Language Models by Matrix Decomposition. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, pages 884–889, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Compressing Pre-trained Language Models by Matrix Decomposition (Ben Noach & Goldberg, AACL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2020.aacl-main.88.pdf
Data
GLUEQNLI