Domain-Aware k-Nearest-Neighbor Knowledge Distillation for Machine Translation
Zhexuan Wang, Shudong Liu, Xuebo Liu, Miao Zhang, Derek Wong, Min Zhang
Abstract
kNN-MT has utilized neighborhood knowledge for auxiliary decoding, significantly improving translation performance. Subsequently, kNN-KD transitions the use of neighborhood knowledge from the decoding phase to the training phase, to address the temporal and spatial inefficiencies inherent in kNN-MT. However, kNN-KD transfers all the kNN knowledge arbitrarily, which has the potential to restrict the learning of student models. In this paper, we propose a novel domain-aware kNN-KD method, which filters out domain-relevant neighborhood knowledge for learning in the distillation process. Notably, this entire process exclusively utilizes the neighborhood knowledge of the original model, eliminating the need for establishing any additional datastores. Experiments on four domain translation tasks demonstrate that our method achieves state-of-the-art performance, realizing an average gain of 1.55 COMET and 1.42 BLEU scores, by further enhancing the translation of rare words. Source code can be accessed at https://github.com/wangzx1219/Dk-KD.- Anthology ID:
- 2024.findings-acl.563
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2024
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 9458–9469
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-acl.563/
- DOI:
- 10.18653/v1/2024.findings-acl.563
- Cite (ACL):
- Zhexuan Wang, Shudong Liu, Xuebo Liu, Miao Zhang, Derek Wong, and Min Zhang. 2024. Domain-Aware k-Nearest-Neighbor Knowledge Distillation for Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2024, pages 9458–9469, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Domain-Aware k-Nearest-Neighbor Knowledge Distillation for Machine Translation (Wang et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-acl.563.pdf