Elhuyar submission to the Biomedical Translation Task 2020 on terminology and abstracts translation

Ander Corral, Xabier Saralegi


Abstract
This article describes the systems submitted by Elhuyar to the 2020 Biomedical Translation Shared Task, specifically the systems presented in the subtasks of terminology translation for English-Basque and abstract translation for English-Basque and English-Spanish. In all cases a Transformer architecture was chosen and we studied different strategies to combine open domain data with biomedical domain data for building the training corpora. For the English-Basque pair, given the scarcity of parallel corpora in the biomedical domain, we set out to create domain training data in a synthetic way. The systems presented in the terminology and abstract translation subtasks for the English-Basque language pair ranked first in their respective tasks among four participants, achieving 0.78 accuracy for terminology translation and a BLEU of 0.1279 for the translation of abstracts. In the abstract translation task for the English-Spanish pair our team ranked second (BLEU=0.4498) in the case of OK sentences.
Anthology ID:
2020.wmt-1.87
Volume:
Proceedings of the Fifth Conference on Machine Translation
Month:
November
Year:
2020
Address:
Online
Venues:
EMNLP | WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
813–819
Language:
URL:
https://aclanthology.org/2020.wmt-1.87
DOI:
Bibkey:
Cite (ACL):
Ander Corral and Xabier Saralegi. 2020. Elhuyar submission to the Biomedical Translation Task 2020 on terminology and abstracts translation. In Proceedings of the Fifth Conference on Machine Translation, pages 813–819, Online. Association for Computational Linguistics.
Cite (Informal):
Elhuyar submission to the Biomedical Translation Task 2020 on terminology and abstracts translation (Corral & Saralegi, WMT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.wmt-1.87.pdf
Video:
 https://slideslive.com/38939591