An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages
Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, David Yarowsky
Abstract
In this work, we explore massively multilingual low-resource neural machine translation. Using translations of the Bible (which have parallel structure across languages), we train models with up to 1,107 source languages. We create various multilingual corpora, varying the number and relatedness of source languages. Using these, we investigate the best ways to use this many-way aligned resource for multilingual machine translation. Our experiments employ a grammatically and phylogenetically diverse set of source languages during testing for more representative evaluations. We find that best practices in this domain are highly language-specific: adding more languages to a training set is often better, but too many harms performance—the best number depends on the source language. Furthermore, training on related languages can improve or degrade performance, depending on the language. As there is no one-size-fits-most answer, we find that it is critical to tailor one’s approach to the source language and its typology.- Anthology ID:
- 2020.lrec-1.458
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 3710–3718
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.458
- DOI:
- Cite (ACL):
- Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, and David Yarowsky. 2020. An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3710–3718, Marseille, France. European Language Resources Association.
- Cite (Informal):
- An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages (Mueller et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2020.lrec-1.458.pdf