Abstract
In a practical real-world scenario, the longstanding goal is that a universal multilingual translation model can be incrementally updated when new language pairs arrive. Specifically, the initial vocabulary only covers some of the words in new languages, which hurts the translation quality for incremental learning. Although existing approaches attempt to address this issue by replacing the original vocabulary with a rebuilt vocabulary or constructing independent language-specific vocabularies, these methods can not meet the following three demands simultaneously: (1) High translation quality for original and incremental languages, (2) low cost for model training, (3) low time overhead for preprocessing. In this work, we propose an entropy-based vocabulary substitution (EVS) method that just needs to walk through new language pairs for incremental learning in a large-scale multilingual data updating while remaining the size of the vocabulary. Our method has access to learn new knowledge from updated training samples incrementally while keeping high translation quality for original language pairs, alleviating the issue of catastrophic forgetting. Results of experiments show that EVS can achieve better performance and save excess overhead for incremental learning in the multilingual machine translation task.- Anthology ID:
- 2022.emnlp-main.720
- Original:
- 2022.emnlp-main.720v1
- Version 2:
- 2022.emnlp-main.720v2
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10537–10550
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.720
- DOI:
- 10.18653/v1/2022.emnlp-main.720
- Cite (ACL):
- Kaiyu Huang, Peng Li, Jin Ma, and Yang Liu. 2022. Entropy-Based Vocabulary Substitution for Incremental Learning in Multilingual Neural Machine Translation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10537–10550, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Entropy-Based Vocabulary Substitution for Incremental Learning in Multilingual Neural Machine Translation (Huang et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.emnlp-main.720.pdf