Towards Zero-Shot Knowledge Distillation for Natural Language Processing

Ahmad Rashid; Vasileios Lioutas; Abbas Ghaddar; Mehdi Rezagholizadeh

doi:10.18653/v1/2021.emnlp-main.526

Towards Zero-Shot Knowledge Distillation for Natural Language Processing

Ahmad Rashid, Vasileios Lioutas, Abbas Ghaddar, Mehdi Rezagholizadeh

Abstract

Knowledge distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions. In its regular manifestations, KD requires access to the teacher’s training data for knowledge transfer to the student network. However, privacy concerns, data regulations and proprietary reasons may prevent access to such data. We present, to the best of our knowledge, the first work on Zero-shot Knowledge Distillation for NLP, where the student learns from the much larger teacher without any task specific data. Our solution combines out-of-domain data and adversarial training to learn the teacher’s output distribution. We investigate six tasks from the GLUE benchmark and demonstrate that we can achieve between 75% and 92% of the teacher’s classification score (accuracy or F1) while compressing the model 30 times.

Anthology ID:: 2021.emnlp-main.526
Volume:: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2021
Address:: Online and Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6551–6561
Language:
URL:: https://aclanthology.org/2021.emnlp-main.526
DOI:: 10.18653/v1/2021.emnlp-main.526
Bibkey:
Cite (ACL):: Ahmad Rashid, Vasileios Lioutas, Abbas Ghaddar, and Mehdi Rezagholizadeh. 2021. Towards Zero-Shot Knowledge Distillation for Natural Language Processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6551–6561, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Towards Zero-Shot Knowledge Distillation for Natural Language Processing (Rashid et al., EMNLP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/2021.emnlp-main.526.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-2/2021.emnlp-main.526.mp4
Data: GLUE, MRPC, MultiNLI, QNLI, SST, SST-2

PDF Search Video