Back-translation for Large-Scale Multilingual Machine Translation

Baohao Liao, Shahram Khadivi, Sanjika Hewavitharana


Abstract
This paper illustrates our approach to the shared task on large-scale multilingual machine translation in the sixth conference on machine translation (WMT-21). In this work, we aim to build a single multilingual translation system with a hypothesis that a universal cross-language representation leads to better multilingual translation performance. We extend the exploration of different back-translation methods from bilingual translation to multilingual translation. Better performance is obtained by the constrained sampling method, which is different from the finding of the bilingual translation. Besides, we also explore the effect of vocabularies and the amount of synthetic data. Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement. We submitted to both the small tasks and achieve the second place.
Anthology ID:
2021.wmt-1.50
Volume:
Proceedings of the Sixth Conference on Machine Translation
Month:
November
Year:
2021
Address:
Online
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
418–424
Language:
URL:
https://aclanthology.org/2021.wmt-1.50
DOI:
Bibkey:
Cite (ACL):
Baohao Liao, Shahram Khadivi, and Sanjika Hewavitharana. 2021. Back-translation for Large-Scale Multilingual Machine Translation. In Proceedings of the Sixth Conference on Machine Translation, pages 418–424, Online. Association for Computational Linguistics.
Cite (Informal):
Back-translation for Large-Scale Multilingual Machine Translation (Liao et al., WMT 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.wmt-1.50.pdf
Code
 baohaoliao/multiback