Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation

Zhongjian Miao; Wen Zhang; Jinsong Su; Xiang Li; Jian Luan; Yidong Chen; Bin Wang; Min Zhang

doi:10.18653/v1/2023.emnlp-main.178

Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation

Zhongjian Miao, Wen Zhang, Jinsong Su, Xiang Li, Jian Luan, Yidong Chen, Bin Wang, Min Zhang

Abstract

Conventional knowledge distillation(KD) approaches are commonly employed to compress neural machine translation(NMT) models. However, they only obtain one lightweight student each time. Consequently, we have to conduct KD multiple times when different students are required at the same time, which could be resource-intensive. Additionally, these students are individually optimized, and thus lack interactions with each other, leading to their potential not being fully exerted. In this work, we propose a novel All-In-One Knowledge Distillation(AIO-KD) framework for NMT, which generates multiple satisfactory students at once. Under AIO-KD, we first randomly extract fewer-layer subnetworks from the teacher as the sample students. Then, we jointly optimize the teacher and these students, where the students simultaneously learn the knowledge from the teacher and interact with other students via mutual learning. When utilized, we re-extract the candidate students, satisfying the specifications of various devices. Particularly, we adopt carefully-designed strategies for AIO-KD: 1) we dynamically detach gradients to prevent poorly-performed students from negatively affecting the teacher during the knowledge transfer, which could subsequently impact other students; 2) we design a two-stage mutual learning strategy, which alleviates the negative impacts of poorly-performed students on the early-stage student interactions. Extensive experiments and in-depth analyses on three benchmarks demonstrate the effectiveness and eco-friendliness of AIO-KD. Our source code is available at https://github.com/DeepLearnXMU/AIO-KD.

Anthology ID:: 2023.emnlp-main.178
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2929–2940
Language:
URL:: https://aclanthology.org/2023.emnlp-main.178
DOI:: 10.18653/v1/2023.emnlp-main.178
Bibkey:
Cite (ACL):: Zhongjian Miao, Wen Zhang, Jinsong Su, Xiang Li, Jian Luan, Yidong Chen, Bin Wang, and Min Zhang. 2023. Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2929–2940, Singapore. Association for Computational Linguistics.
Cite (Informal):: Exploring All-In-One Knowledge Distillation Framework for Neural Machine Translation (Miao et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/dois-2013-emnlp/2023.emnlp-main.178.pdf
Video:: https://preview.aclanthology.org/dois-2013-emnlp/2023.emnlp-main.178.mp4

PDF Search Video