Multilingual k-Nearest-Neighbor Machine Translation

David Stap, Christof Monz


Abstract
k-nearest-neighbor machine translation has demonstrated remarkable improvements in machine translation quality by creating a datastore of cached examples. However, these improvements have been limited to high-resource language pairs, with large datastores, and remain a challenge for low-resource languages. In this paper, we address this issue by combining representations from multiple languages into a single datastore. Our results consistently demonstrate substantial improvements not only in low-resource translation quality (up to +3.6 BLEU), but also for high-resource translation quality (up to +0.5 BLEU). Our experiments show that it is possible to create multilingual datastores that are a quarter of the size, achieving a 5.3x speed improvement, by using linguistic similarities for datastore creation.
Anthology ID:
2023.emnlp-main.571
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9200–9208
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.571/
DOI:
10.18653/v1/2023.emnlp-main.571
Bibkey:
Cite (ACL):
David Stap and Christof Monz. 2023. Multilingual k-Nearest-Neighbor Machine Translation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9200–9208, Singapore. Association for Computational Linguistics.
Cite (Informal):
Multilingual k-Nearest-Neighbor Machine Translation (Stap & Monz, EMNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.571.pdf
Video:
 https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2023.emnlp-main.571.mp4