Prompsit’s Submission to the IWSLT 2018 Low Resource Machine Translation Task

Víctor M. Sánchez-Cartagena


Abstract
This paper presents Prompsit Language Engineering’s submission to the IWSLT 2018 Low Resource Machine Translation task. Our submission is based on cross-lingual learning: a multilingual neural machine translation system was created with the sole purpose of improving translation quality on the Basque-to-English language pair. The multilingual system was trained on a combination of in-domain data, pseudo in-domain data obtained via cross-entropy data selection and backtranslated data. We morphologically segmented Basque text with a novel approach that only requires a dictionary such as those used by spell checkers and proved that this segmentation approach outperforms the widespread byte pair encoding strategy for this task.
Anthology ID:
2018.iwslt-1.14
Volume:
Proceedings of the 15th International Conference on Spoken Language Translation
Month:
October 29-30
Year:
2018
Address:
Brussels
Editors:
Marco Turchi, Jan Niehues, Marcello Frederico
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
International Conference on Spoken Language Translation
Note:
Pages:
95–103
Language:
URL:
https://aclanthology.org/2018.iwslt-1.14
DOI:
Bibkey:
Cite (ACL):
Víctor M. Sánchez-Cartagena. 2018. Prompsit’s Submission to the IWSLT 2018 Low Resource Machine Translation Task. In Proceedings of the 15th International Conference on Spoken Language Translation, pages 95–103, Brussels. International Conference on Spoken Language Translation.
Cite (Informal):
Prompsit’s Submission to the IWSLT 2018 Low Resource Machine Translation Task (Sánchez-Cartagena, IWSLT 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2018.iwslt-1.14.pdf