Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation

Inigo Jauregi Unanue, Massimo Piccardi


Abstract
This paper describes the machine translation systems proposed by the University of Technology Sydney Natural Language Processing (UTS_NLP) team for the WMT20 English-Basque biomedical translation tasks. Due to the limited parallel corpora available, we have proposed to train a BERT-fused NMT model that leverages the use of pretrained language models. Furthermore, we have augmented the training corpus by backtranslating monolingual data. Our experiments show that NMT models in low-resource scenarios can benefit from combining these two training techniques, with improvements of up to 6.16 BLEU percentual points in the case of biomedical abstract translations.
Anthology ID:
2020.wmt-1.89
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
826–832
Language:
URL:
https://aclanthology.org/2020.wmt-1.89
DOI:
Bibkey:
Cite (ACL):
Inigo Jauregi Unanue and Massimo Piccardi. 2020. Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation. In Proceedings of the Fifth Conference on Machine Translation, pages 826–832, Online. Association for Computational Linguistics.
Cite (Informal):
Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation (Jauregi Unanue & Piccardi, WMT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.wmt-1.89.pdf
Video:
 https://slideslive.com/38939562