Natural Language Generation for Effective Knowledge Distillation

Raphael Tang; Yao Lu; Jimmy Lin

doi:10.18653/v1/D19-6122

Natural Language Generation for Effective Knowledge Distillation

Abstract

Knowledge distillation can effectively transfer knowledge from BERT, a deep language representation model, to traditional, shallow word embedding-based neural networks, helping them approach or exceed the quality of other heavyweight language representation models. As shown in previous work, critical to this distillation procedure is the construction of an unlabeled transfer dataset, which enables effective knowledge transfer. To create transfer set examples, we propose to sample from pretrained language models fine-tuned on task-specific text. Unlike previous techniques, this directly captures the purpose of the transfer set. We hypothesize that this principled, general approach outperforms rule-based techniques. On four datasets in sentiment classification, sentence similarity, and linguistic acceptability, we show that our approach improves upon previous methods. We outperform OpenAI GPT, a deep pretrained transformer, on three of the datasets, while using a single-layer bidirectional LSTM that runs at least ten times faster.

Anthology ID:: D19-6122
Volume:: Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Colin Cherry, Greg Durrett, George Foster, Reza Haffari, Shahram Khadivi, Nanyun Peng, Xiang Ren, Swabha Swayamdipta
Venue:: WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 202–208
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/D19-6122/
DOI:: 10.18653/v1/D19-6122
Bibkey:
Cite (ACL):: Raphael Tang, Yao Lu, and Jimmy Lin. 2019. Natural Language Generation for Effective Knowledge Distillation. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 202–208, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: Natural Language Generation for Effective Knowledge Distillation (Tang et al., 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/D19-6122.pdf
Code: castorini/d-bert

PDF Cite Search Code Fix data