Neural Machine Translation for English–Kazakh with Morphological Segmentation and Synthetic Data
Antonio Toral, Lukas Edman, Galiya Yeshmagambetova, Jennifer Spenader
Abstract
This paper presents the systems submitted by the University of Groningen to the English– Kazakh language pair (both translation directions) for the WMT 2019 news translation task. We explore the potential benefits of (i) morphological segmentation (both unsupervised and rule-based), given the agglutinative nature of Kazakh, (ii) data from two additional languages (Turkish and Russian), given the scarcity of English–Kazakh data and (iii) synthetic data, both for the source and for the target language. Our best submissions ranked second for Kazakh→English and third for English→Kazakh in terms of the BLEU automatic evaluation metric.- Anthology ID:
- W19-5343
- Volume:
- Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Marco Turchi, Karin Verspoor
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 386–392
- Language:
- URL:
- https://aclanthology.org/W19-5343
- DOI:
- 10.18653/v1/W19-5343
- Cite (ACL):
- Antonio Toral, Lukas Edman, Galiya Yeshmagambetova, and Jennifer Spenader. 2019. Neural Machine Translation for English–Kazakh with Morphological Segmentation and Synthetic Data. In Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1), pages 386–392, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Neural Machine Translation for English–Kazakh with Morphological Segmentation and Synthetic Data (Toral et al., WMT 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/W19-5343.pdf