CUNI Submissions in WMT18

Tom Kocmi, Roman Sudarikov, Ondřej Bojar


Abstract
We participated in the WMT 2018 shared news translation task in three language pairs: English-Estonian, English-Finnish, and English-Czech. Our main focus was the low-resource language pair of Estonian and English for which we utilized Finnish parallel data in a simple method. We first train a “parent model” for the high-resource language pair followed by adaptation on the related low-resource language pair. This approach brings a substantial performance boost over the baseline system trained only on Estonian-English parallel data. Our systems are based on the Transformer architecture. For the English to Czech translation, we have evaluated our last year models of hybrid phrase-based approach and neural machine translation mainly for comparison purposes.
Anthology ID:
W18-6416
Volume:
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Month:
October
Year:
2018
Address:
Belgium, Brussels
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
431–437
Language:
URL:
https://aclanthology.org/W18-6416
DOI:
10.18653/v1/W18-6416
Bibkey:
Cite (ACL):
Tom Kocmi, Roman Sudarikov, and Ondřej Bojar. 2018. CUNI Submissions in WMT18. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 431–437, Belgium, Brussels. Association for Computational Linguistics.
Cite (Informal):
CUNI Submissions in WMT18 (Kocmi et al., WMT 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/W18-6416.pdf
Data
WMT 2018