Abstract
This paper presents Prompsit Language Engineering’s submission to the IWSLT 2018 Low Resource Machine Translation task. Our submission is based on cross-lingual learning: a multilingual neural machine translation system was created with the sole purpose of improving translation quality on the Basque-to-English language pair. The multilingual system was trained on a combination of in-domain data, pseudo in-domain data obtained via cross-entropy data selection and backtranslated data. We morphologically segmented Basque text with a novel approach that only requires a dictionary such as those used by spell checkers and proved that this segmentation approach outperforms the widespread byte pair encoding strategy for this task.- Anthology ID:
- 2018.iwslt-1.14
- Volume:
- Proceedings of the 15th International Conference on Spoken Language Translation
- Month:
- October 29-30
- Year:
- 2018
- Address:
- Brussels
- Editors:
- Marco Turchi, Jan Niehues, Marcello Frederico
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- International Conference on Spoken Language Translation
- Note:
- Pages:
- 95–103
- Language:
- URL:
- https://aclanthology.org/2018.iwslt-1.14
- DOI:
- Cite (ACL):
- Víctor M. Sánchez-Cartagena. 2018. Prompsit’s Submission to the IWSLT 2018 Low Resource Machine Translation Task. In Proceedings of the 15th International Conference on Spoken Language Translation, pages 95–103, Brussels. International Conference on Spoken Language Translation.
- Cite (Informal):
- Prompsit’s Submission to the IWSLT 2018 Low Resource Machine Translation Task (Sánchez-Cartagena, IWSLT 2018)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2018.iwslt-1.14.pdf