The University of Helsinki Submissions to the WMT19 News Translation Task
Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, Jörg Tiedemann
Abstract
In this paper we present the University of Helsinki submissions to the WMT 2019 shared news translation task in three language pairs: English-German, English-Finnish and Finnish-English. This year we focused first on cleaning and filtering the training data using multiple data-filtering approaches, resulting in much smaller and cleaner training sets. For English-German we trained both sentence-level transformer models as well as compared different document-level translation approaches. For Finnish-English and English-Finnish we focused on different segmentation approaches and we also included a rule-based system for English-Finnish.- Anthology ID:
- W19-5347
- Volume:
- Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 412–423
- Language:
- URL:
- https://aclanthology.org/W19-5347
- DOI:
- 10.18653/v1/W19-5347
- Cite (ACL):
- Aarne Talman, Umut Sulubacak, Raúl Vázquez, Yves Scherrer, Sami Virpioja, Alessandro Raganato, Arvi Hurskainen, and Jörg Tiedemann. 2019. The University of Helsinki Submissions to the WMT19 News Translation Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 412–423, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- The University of Helsinki Submissions to the WMT19 News Translation Task (Talman et al., WMT 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W19-5347.pdf