HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression

Chenhe Dong, Yaliang Li, Ying Shen, Minghui Qiu


Abstract
On many natural language processing tasks, large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. Nevertheless, their huge model size and low inference speed have hindered the deployment on resource-limited devices in practice. In this paper, we target to compress PLMs with knowledge distillation, and propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information. Specifically, to enhance the model capability and transferability, we leverage the idea of meta-learning and set up domain-relational graphs to capture the relational information across different domains. And to dynamically select the most representative prototypes for each domain, we propose a hierarchical compare-aggregate mechanism to capture hierarchical relationships. Extensive experiments on public multi-domain datasets demonstrate the superior performance of our HRKD method as well as its strong few-shot learning ability. For reproducibility, we release the code at https://github.com/cheneydon/hrkd.
Anthology ID:
2021.emnlp-main.250
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Editors:
Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3126–3136
Language:
URL:
https://aclanthology.org/2021.emnlp-main.250
DOI:
10.18653/v1/2021.emnlp-main.250
Bibkey:
Cite (ACL):
Chenhe Dong, Yaliang Li, Ying Shen, and Minghui Qiu. 2021. HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3126–3136, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression (Dong et al., EMNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2021.emnlp-main.250.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-2/2021.emnlp-main.250.mp4
Code
 cheneydon/hrkd
Data
MultiNLI