Entropy-Based Vocabulary Substitution for Incremental Learning in Multilingual Neural Machine Translation

Kaiyu Huang, Peng Li, Jin Ma, Yang Liu


Abstract
In a practical real-world scenario, the longstanding goal is that a universal multilingual translation model can be incrementally updated when new language pairs arrive. Specifically, the initial vocabulary only covers some of the words in new languages, which hurts the translation quality for incremental learning. Although existing approaches attempt to address this issue by replacing the original vocabulary with a rebuilt vocabulary or constructing independent language-specific vocabularies, these methods can not meet the following three demands simultaneously: (1) High translation quality for original and incremental languages, (2) low cost for model training, (3) low time overhead for preprocessing. In this work, we propose an entropy-based vocabulary substitution (EVS) method that just needs to walk through new language pairs for incremental learning in a large-scale multilingual data updating while remaining the size of the vocabulary. Our method has access to learn new knowledge from updated training samples incrementally while keeping high translation quality for original language pairs, alleviating the issue of catastrophic forgetting. Results of experiments show that EVS can achieve better performance and save excess overhead for incremental learning in the multilingual machine translation task.
Anthology ID:
2022.emnlp-main.720
Original:
2022.emnlp-main.720v1
Version 2:
2022.emnlp-main.720v2
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10537–10550
Language:
URL:
https://aclanthology.org/2022.emnlp-main.720
DOI:
10.18653/v1/2022.emnlp-main.720
Bibkey:
Cite (ACL):
Kaiyu Huang, Peng Li, Jin Ma, and Yang Liu. 2022. Entropy-Based Vocabulary Substitution for Incremental Learning in Multilingual Neural Machine Translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10537–10550, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Entropy-Based Vocabulary Substitution for Incremental Learning in Multilingual Neural Machine Translation (Huang et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/2022.emnlp-main.720.pdf