The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task
Alexandra Chronopoulou, Dario Stojanovski, Viktor Hangya, Alexander Fraser
Abstract
This paper describes the submission of LMU Munich to the WMT 2020 unsupervised shared task, in two language directions, German↔Upper Sorbian. Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al. (2020), using a monolingual pretrained language generation model (on German) and fine-tuning it on both German and Upper Sorbian, before initializing a UNMT model, which is trained with online backtranslation. Pseudo-parallel data obtained from an unsupervised statistical machine translation (USMT) system is used to fine-tune the UNMT model. We also apply BPE-Dropout to the low resource (Upper Sorbian) data to obtain a more robust system. We additionally experiment with residual adapters and find them useful in the Upper Sorbian→German direction. We explore sampling during backtranslation and curriculum learning to use SMT translations in a more principled way. Finally, we ensemble our best-performing systems and reach a BLEU score of 32.4 on German→Upper Sorbian and 35.2 on Upper Sorbian→German.- Anthology ID:
- 2020.wmt-1.128
- Volume:
- Proceedings of the Fifth Conference on Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1084–1091
- Language:
- URL:
- https://aclanthology.org/2020.wmt-1.128
- DOI:
- Cite (ACL):
- Alexandra Chronopoulou, Dario Stojanovski, Viktor Hangya, and Alexander Fraser. 2020. The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task. In Proceedings of the Fifth Conference on Machine Translation, pages 1084–1091, Online. Association for Computational Linguistics.
- Cite (Informal):
- The LMU Munich System for the WMT 2020 Unsupervised Machine Translation Shared Task (Chronopoulou et al., WMT 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.wmt-1.128.pdf
- Code
- alexandra-chron/umt-lmu-wmt2020