CUNI Submissions in WMT18

Tom Kocmi, Roman Sudarikov, Ondřej Bojar


Abstract
We participated in the WMT 2018 shared news translation task in three language pairs: English-Estonian, English-Finnish, and English-Czech. Our main focus was the low-resource language pair of Estonian and English for which we utilized Finnish parallel data in a simple method. We first train a “parent model” for the high-resource language pair followed by adaptation on the related low-resource language pair. This approach brings a substantial performance boost over the baseline system trained only on Estonian-English parallel data. Our systems are based on the Transformer architecture. For the English to Czech translation, we have evaluated our last year models of hybrid phrase-based approach and neural machine translation mainly for comparison purposes.
Anthology ID:
W18-6416
Volume:
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Month:
October
Year:
2018
Address:
Belgium, Brussels
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
431–437
Language:
URL:
https://aclanthology.org/W18-6416
DOI:
10.18653/v1/W18-6416
Bibkey:
Cite (ACL):
Tom Kocmi, Roman Sudarikov, and Ondřej Bojar. 2018. CUNI Submissions in WMT18. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 431–437, Belgium, Brussels. Association for Computational Linguistics.
Cite (Informal):
CUNI Submissions in WMT18 (Kocmi et al., WMT 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/W18-6416.pdf
Data
WMT 2018