@inproceedings{dong-etal-2021-hrkd,
    title = "{HRKD}: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression",
    author = "Dong, Chenhe  and
      Li, Yaliang  and
      Shen, Ying  and
      Qiu, Minghui",
    editor = "Moens, Marie-Francine  and
      Huang, Xuanjing  and
      Specia, Lucia  and
      Yih, Scott Wen-tau",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.emnlp-main.250/",
    doi = "10.18653/v1/2021.emnlp-main.250",
    pages = "3126--3136",
    abstract = "On many natural language processing tasks, large pre-trained language models (PLMs) have shown overwhelming performances compared with traditional neural network methods. Nevertheless, their huge model size and low inference speed have hindered the deployment on resource-limited devices in practice. In this paper, we target to compress PLMs with knowledge distillation, and propose a hierarchical relational knowledge distillation (HRKD) method to capture both hierarchical and domain relational information. Specifically, to enhance the model capability and transferability, we leverage the idea of meta-learning and set up domain-relational graphs to capture the relational information across different domains. And to dynamically select the most representative prototypes for each domain, we propose a hierarchical compare-aggregate mechanism to capture hierarchical relationships. Extensive experiments on public multi-domain datasets demonstrate the superior performance of our HRKD method as well as its strong few-shot learning ability. For reproducibility, we release the code at \url{https://github.com/cheneydon/hrkd}."
}Markdown (Informal)
[HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression](https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2021.emnlp-main.250/) (Dong et al., EMNLP 2021)
ACL