Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation

Inigo Jauregi Unanue; Massimo Piccardi

Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation

Abstract

This paper describes the machine translation systems proposed by the University of Technology Sydney Natural Language Processing (UTS_NLP) team for the WMT20 English-Basque biomedical translation tasks. Due to the limited parallel corpora available, we have proposed to train a BERT-fused NMT model that leverages the use of pretrained language models. Furthermore, we have augmented the training corpus by backtranslating monolingual data. Our experiments show that NMT models in low-resource scenarios can benefit from combining these two training techniques, with improvements of up to 6.16 BLEU percentual points in the case of biomedical abstract translations.

Anthology ID:: 2020.wmt-1.89
Volume:: Proceedings of the Fifth Conference on Machine Translation
Month:: November
Year:: 2020
Address:: Online
Venues:: EMNLP | WMT
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 826–832
Language:
URL:: https://aclanthology.org/2020.wmt-1.89
DOI:
Bibkey:
Cite (ACL):: Inigo Jauregi Unanue and Massimo Piccardi. 2020. Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation. In Proceedings of the Fifth Conference on Machine Translation, pages 826–832, Online. Association for Computational Linguistics.
Cite (Informal):: Pretrained Language Models and Backtranslation for English-Basque Biomedical Neural Machine Translation (Jauregi Unanue & Piccardi, WMT 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/2020.wmt-1.89.pdf
Video:: https://slideslive.com/38939562

PDF Cite Search Video