A Self-Distillation Recipe for Neural Machine Translation

Hongfei Xu, Zhuofei Liang, Qiuhui Liu, Lingling Mu


Abstract
Self-distillation distills the deeper sub-networks to the shallower sub-networks without using an extra teacher model, and has been proven effective in improving the performance of a series of computer vision tasks. In this paper, we study the representation-based self-distillation methods for Neural Machine Translation (NMT) considering the efficiency issue with a large vocabulary. We present a rank-order augmented Pearson correlation loss and an iterative distillation method to prevent the discrepancy of predictions between the student and a stronger teacher from disturbing the training. To prevent the teacher from misleading the student’s learning, we utilize a warm-up strategy and present a gradient adaption method to scale down or zero the Knowledge Distillation (KD) gradients which are opposite to the translation. Experiments show that our method can lead to significant improvements over the strong Transformer baseline on low/middle/high-resource tasks, obtaining comparable performance to previous MT KD studies without pre-training a teacher. Deeper Transformer experiments show that our method can lead to comparable or better performance with fewer layers.
Anthology ID:
2025.findings-acl.261
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5050–5064
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.261/
DOI:
Bibkey:
Cite (ACL):
Hongfei Xu, Zhuofei Liang, Qiuhui Liu, and Lingling Mu. 2025. A Self-Distillation Recipe for Neural Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 5050–5064, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
A Self-Distillation Recipe for Neural Machine Translation (Xu et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.findings-acl.261.pdf