Papago’s Submissions to the WMT21 Triangular Translation Task

Jeonghyeok Park; Hyunjoong Kim; Hyunchang Cho

Papago’s Submissions to the WMT21 Triangular Translation Task

Jeonghyeok Park, Hyunjoong Kim, Hyunchang Cho

Abstract

This paper describes Naver Papago’s submission to the WMT21 shared triangular MT task to enhance the non-English MT system with tri-language parallel data. The provided parallel data are Russian-Chinese (direct), Russian-English (indirect), and English-Chinese (indirect) data. This task aims to improve the quality of the Russian-to-Chinese MT system by exploiting the direct and indirect parallel re- sources. The direct parallel data is noisy data crawled from the web. To alleviate the issue, we conduct extensive experiments to find effective data filtering methods. With the empirical knowledge that the performance of bilingual MT is better than multi-lingual MT and related experiment results, we approach this task as bilingual MT, where the two indirect data are transformed to direct data. In addition, we use the Transformer, a robust translation model, as our baseline and integrate several techniques, averaging checkpoints, model ensemble, and re-ranking. Our final system provides a 12.7 BLEU points improvement over a baseline system on the WMT21 triangular MT development set. In the official evalua- tion of the test set, ours is ranked 2nd in terms of BLEU scores.

Anthology ID:: 2021.wmt-1.40
Volume:: Proceedings of the Sixth Conference on Machine Translation
Month:: November
Year:: 2021
Address:: Online
Editors:: Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Tom Kocmi, Andre Martins, Makoto Morishita, Christof Monz
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 341–346
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2021.wmt-1.40/
DOI:
Bibkey:
Cite (ACL):: Jeonghyeok Park, Hyunjoong Kim, and Hyunchang Cho. 2021. Papago’s Submissions to the WMT21 Triangular Translation Task. In Proceedings of the Sixth Conference on Machine Translation, pages 341–346, Online. Association for Computational Linguistics.
Cite (Informal):: Papago’s Submissions to the WMT21 Triangular Translation Task (Park et al., WMT 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2021.wmt-1.40.pdf

PDF Cite Search Fix data