Abstract
Multilingual Neural Machine Translation (MNMT) models are commonly trained on a joint set of bilingual corpora which is acutely English-centric (i.e. English either as source or target language). While direct data between two languages that are non-English is explicitly available at times, its use is not common. In this paper, we first take a step back and look at the commonly used bilingual corpora (WMT), and resurface the existence and importance of implicit structure that existed in it: multi-way alignment across examples (the same sentence in more than two languages). We set out to study the use of multi-way aligned examples in order to enrich the original English-centric parallel corpora. We reintroduce this direct parallel data from multi-way aligned corpora between all source and target languages. By doing so, the English-centric graph expands into a complete graph, every language pair being connected. We call MNMT with such connectivity pattern complete Multilingual Neural Machine Translation (cMNMT) and demonstrate its utility and efficacy with a series of experiments and analysis. In combination with a novel training data sampling strategy that is conditioned on the target language only, cMNMT yields competitive translation quality for all language pairs. We further study the size effect of multi-way aligned data, its transfer learning capabilities and how it eases adding a new language in MNMT. Finally, we stress test cMNMT at scale and demonstrate that we can train a cMNMT model with up to 12,432 language pairs that provides competitive translation quality for all language pairs.- Anthology ID:
- 2020.wmt-1.66
- Volume:
- Proceedings of the Fifth Conference on Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 550–560
- Language:
- URL:
- https://aclanthology.org/2020.wmt-1.66
- DOI:
- Cite (ACL):
- Markus Freitag and Orhan Firat. 2020. Complete Multilingual Neural Machine Translation. In Proceedings of the Fifth Conference on Machine Translation, pages 550–560, Online. Association for Computational Linguistics.
- Cite (Informal):
- Complete Multilingual Neural Machine Translation (Freitag & Firat, WMT 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.wmt-1.66.pdf