The University of Helsinki Submissions to the WMT19 News Translation Task

Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann


Abstract
In this paper we present the University of Helsinki submissions to the WMT 2019 shared news translation task in three language pairs: English-German, English-Finnish and Finnish-English. This year we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German we trained both sentence-level transformer models as well as compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches and we also included a rule-based system for English-Finnish.
Anthology ID:
W19-5347
Volume:
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
Month:
August
Year:
2019
Address:
Florence, Italy
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
412–423
Language:
URL:
https://aclanthology.org/W19-5347
DOI:
10.18653/v1/W19-5347
Bibkey:
Cite (ACL):
Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, and Jörg Tiedemann. 2019. The University of Helsinki Submissions to the WMT19 News Translation Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 412–423, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
The University of Helsinki Submissions to the WMT19 News Translation Task (Talman et al., WMT 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W19-5347.pdf