Abstract
We participated in the WMT 2018 shared news translation task in three language pairs: English-Estonian, English-Finnish, and English-Czech. Our main focus was the low-resource language pair of Estonian and English for which we utilized Finnish parallel data in a simple method. We first train a “parent model” for the high-resource language pair followed by adaptation on the related low-resource language pair. This approach brings a substantial performance boost over the baseline system trained only on Estonian-English parallel data. Our systems are based on the Transformer architecture. For the English to Czech translation, we have evaluated our last year models of hybrid phrase-based approach and neural machine translation mainly for comparison purposes.- Anthology ID:
- W18-6416
- Volume:
- Proceedings of the Third Conference on Machine Translation: Shared Task Papers
- Month:
- October
- Year:
- 2018
- Address:
- Belgium, Brussels
- Editors:
- Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 431–437
- Language:
- URL:
- https://aclanthology.org/W18-6416
- DOI:
- 10.18653/v1/W18-6416
- Cite (ACL):
- Tom Kocmi, Roman Sudarikov, and Ondřej Bojar. 2018. CUNI Submissions in WMT18. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 431–437, Belgium, Brussels. Association for Computational Linguistics.
- Cite (Informal):
- CUNI Submissions in WMT18 (Kocmi et al., WMT 2018)
- PDF:
- https://preview.aclanthology.org/landing_page/W18-6416.pdf
- Data
- WMT 2018