Domain-Aware k-Nearest-Neighbor Knowledge Distillation for Machine Translation

Zhexuan Wang, Shudong Liu, Xuebo Liu, Miao Zhang, Derek Wong, Min Zhang


Abstract
kNN-MT has utilized neighborhood knowledge for auxiliary decoding, significantly improving translation performance. Subsequently, kNN-KD transitions the use of neighborhood knowledge from the decoding phase to the training phase, to address the temporal and spatial inefficiencies inherent in kNN-MT. However, kNN-KD transfers all the kNN knowledge arbitrarily, which has the potential to restrict the learning of student models. In this paper, we propose a novel domain-aware kNN-KD method, which filters out domain-relevant neighborhood knowledge for learning in the distillation process. Notably, this entire process exclusively utilizes the neighborhood knowledge of the original model, eliminating the need for establishing any additional datastores. Experiments on four domain translation tasks demonstrate that our method achieves state-of-the-art performance, realizing an average gain of 1.55 COMET and 1.42 BLEU scores, by further enhancing the translation of rare words. Source code can be accessed at https://github.com/wangzx1219/Dk-KD.
Anthology ID:
2024.findings-acl.563
Volume:
Findings of the Association for Computational Linguistics: ACL 2024
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9458–9469
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-acl.563/
DOI:
10.18653/v1/2024.findings-acl.563
Bibkey:
Cite (ACL):
Zhexuan Wang, Shudong Liu, Xuebo Liu, Miao Zhang, Derek Wong, and Min Zhang. 2024. Domain-Aware k-Nearest-Neighbor Knowledge Distillation for Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2024, pages 9458–9469, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
Domain-Aware k-Nearest-Neighbor Knowledge Distillation for Machine Translation (Wang et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-acl.563.pdf