Multilingual Denoising Pre-training for Neural Machine Translation

Yinhan Liu; Jiatao Gu; Naman Goyal; Xian Li; Sergey Edunov; Marjan Ghazvininejad; Mike Lewis; Luke Zettlemoyer

doi:10.1162/tacl_a_00343

Multilingual Denoising Pre-training for Neural Machine Translation

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer

Abstract

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective (Lewis et al., 2019). mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, whereas previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine-tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task- specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show that it enables transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.1

Anthology ID:: 2020.tacl-1.47
Volume:: Transactions of the Association for Computational Linguistics, Volume 8
Month:
Year:: 2020
Address:: Cambridge, MA
Editors:: Mark Johnson, Brian Roark, Ani Nenkova
Venue:: TACL
SIG:
Publisher:: MIT Press
Note:
Pages:: 726–742
Language:
URL:: https://aclanthology.org/2020.tacl-1.47
DOI:: 10.1162/tacl_a_00343
Bibkey:
Cite (ACL):: Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and Luke Zettlemoyer. 2020. Multilingual Denoising Pre-training for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8:726–742.
Cite (Informal):: Multilingual Denoising Pre-training for Neural Machine Translation (Liu et al., TACL 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2020.tacl-1.47.pdf

PDF Search