Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task

Casimiro Pio Carrino, Bardia Rafieian, Marta R. Costa-jussà, José A. R. Fonollosa


Abstract
In this work, we give a description of the TALP-UPC systems submitted for the WMT19 Biomedical Translation Task. Our proposed strategy is NMT model-independent and relies only on one ingredient, a biomedical terminology list. We first extracted such a terminology list by labelling biomedical words in our training dataset using the BabelNet API. Then, we designed a data preparation strategy to insert the terms information at a token level. Finally, we trained the Transformer model with this terms-informed data. Our best-submitted system ranked 2nd and 3rd for Spanish-English and English-Spanish translation directions, respectively.
Anthology ID:
W19-5418
Volume:
Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
Month:
August
Year:
2019
Address:
Florence, Italy
Venues:
ACL | WMT | WS
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
151–155
Language:
URL:
https://aclanthology.org/W19-5418
DOI:
10.18653/v1/W19-5418
Bibkey:
Cite (ACL):
Casimiro Pio Carrino, Bardia Rafieian, Marta R. Costa-jussà, and José A. R. Fonollosa. 2019. Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 151–155, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task (Carrino et al., 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/W19-5418.pdf