Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation

Michael Przystupa, Muhammad Abdul-Mageed


Abstract
We present our contribution to the WMT19 Similar Language Translation shared task. We investigate the utility of neural machine translation on three low-resource, similar language pairs: Spanish – Portuguese, Czech – Polish, and Hindi – Nepali. Since state-of-the-art neural machine translation systems still require large amounts of bitext, which we do not have for the pairs we consider, we focus primarily on incorporating monolingual data into our models with backtranslation. In our analysis, we found Transformer models to work best on Spanish – Portuguese and Czech – Polish translation, whereas LSTMs with global attention worked best on Hindi – Nepali translation.
Anthology ID:
W19-5431
Volume:
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, André Martins, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
224–235
Language:
URL:
https://aclanthology.org/W19-5431
DOI:
10.18653/v1/W19-5431
Bibkey:
Cite (ACL):
Michael Przystupa and Muhammad Abdul-Mageed. 2019. Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 224–235, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation (Przystupa & Abdul-Mageed, WMT 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/W19-5431.pdf