Neural Machine Translation between similar South-Slavic languages

Maja Popović; Alberto Poncelas

Neural Machine Translation between similar South-Slavic languages

Abstract

This paper describes the ADAPT-DCU machine translation systems built for the WMT 2020 shared task on Similar Language Translation. We explored several set-ups for NMT for Croatian–Slovenian and Serbian–Slovenian language pairs in both translation directions. Our experiments focus on different amounts and types of training data: we first apply basic filtering on the OpenSubtitles training corpora, then we perform additional cleaning of remaining misaligned segments based on character n-gram matching. Finally, we make use of additional monolingual data by creating synthetic parallel data through back-translation. Automatic evaluation shows that multilingual systems with joint Serbian and Croatian data are better than bilingual, as well as that character-based cleaning leads to improved scores while using less data. The results also confirm once more that adding back-translated data further improves the performance, especially when the synthetic data is similar to the desired domain of the development and test set. This, however, might come at a price of prolonged training time, especially for multitarget systems.

Anthology ID:: 2020.wmt-1.51
Volume:: Proceedings of the Fifth Conference on Machine Translation
Month:: November
Year:: 2020
Address:: Online
Venue:: WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 430–436
Language:
URL:: https://aclanthology.org/2020.wmt-1.51
DOI:
Bibkey:
Cite (ACL):: Maja Popović and Alberto Poncelas. 2020. Neural Machine Translation between similar South-Slavic languages. In Proceedings of the Fifth Conference on Machine Translation, pages 430–436, Online. Association for Computational Linguistics.
Cite (Informal):: Neural Machine Translation between similar South-Slavic languages (Popović & Poncelas, WMT 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-script-update/2020.wmt-1.51.pdf

PDF Search