LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression

Yihuan Mao, Yujing Wang, Chufan Wu, Chen Zhang, Yang Wang, Quanlu Zhang, Yaming Yang, Yunhai Tong, Jing Bai


Abstract
BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks. However, a major blocking issue of applying BERT to online services is that it is memory-intensive and leads to unsatisfactory latency of user requests, raising the necessity of model compression. Existing solutions leverage the knowledge distillation framework to learn a smaller model that imitates the behaviors of BERT. However, the training procedure of knowledge distillation is expensive itself as it requires sufficient training data to imitate the teacher model. In this paper, we address this issue by proposing a tailored solution named LadaBERT (Lightweight adaptation of BERT through hybrid model compression), which combines the advantages of different model compression methods, including weight pruning, matrix factorization and knowledge distillation. LadaBERT achieves state-of-the-art accuracy on various public datasets while the training overheads can be reduced by an order of magnitude.
Anthology ID:
2020.coling-main.287
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3225–3234
Language:
URL:
https://aclanthology.org/2020.coling-main.287
DOI:
10.18653/v1/2020.coling-main.287
Bibkey:
Cite (ACL):
Yihuan Mao, Yujing Wang, Chufan Wu, Chen Zhang, Yang Wang, Quanlu Zhang, Yaming Yang, Yunhai Tong, and Jing Bai. 2020. LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3225–3234, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression (Mao et al., COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2020.coling-main.287.pdf