Facebook AI’s WMT21 News Translation Task Submission

Chau Tran, Shruti Bhosale, James Cross, Philipp Koehn, Sergey Edunov, Angela Fan


Abstract
We describe Facebook’s multilingual model submission to the WMT2021 shared task on news translation. We participate in 14 language directions: English to and from Czech, German, Hausa, Icelandic, Japanese, Russian, and Chinese. To develop systems covering all these directions, we focus on multilingual models. We utilize data from all available sources — WMT, large-scale data mining, and in-domain backtranslation — to create high quality bilingual and multilingual baselines. Subsequently, we investigate strategies for scaling multilingual model size, such that one system has sufficient capacity for high quality representations of all eight languages. Our final submission is an ensemble of dense and sparse Mixture-of-Expert multilingual translation models, followed by finetuning on in-domain news data and noisy channel reranking. Compared to previous year’s winning submissions, our multilingual system improved the translation quality on all language directions, with an average improvement of 2.0 BLEU. In the WMT2021 task, our system ranks first in 10 directions based on automatic evaluation.
Anthology ID:
2021.wmt-1.19
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Venues:
EMNLP | WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
205–215
Language:
URL:
https://aclanthology.org/2021.wmt-1.19
DOI:
Bibkey:
Cite (ACL):
Chau Tran, Shruti Bhosale, James Cross, Philipp Koehn, Sergey Edunov, and Angela Fan. 2021. Facebook AI’s WMT21 News Translation Task Submission. In Proceedings of the Sixth Conference on Machine Translation, pages 205–215, Online. Association for Computational Linguistics.
Cite (Informal):
Facebook AI’s WMT21 News Translation Task Submission (Tran et al., WMT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2021.wmt-1.19.pdf
Data
CC100CCAlignedCCMatrixWMT 2020