Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task
Casimiro Pio Carrino, Bardia Rafieian, Marta R. Costa-jussà, José A. R. Fonollosa
Abstract
In this work, we give a description of the TALP-UPC systems submitted for the WMT19 Biomedical Translation Task. Our proposed strategy is NMT model-independent and relies only on one ingredient, a biomedical terminology list. We first extracted such a terminology list by labelling biomedical words in our training dataset using the BabelNet API. Then, we designed a data preparation strategy to insert the terms information at a token level. Finally, we trained the Transformer model with this terms-informed data. Our best-submitted system ranked 2nd and 3rd for Spanish-English and English-Spanish translation directions, respectively.- Anthology ID:
- W19-5418
- Volume:
- Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2)
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 151–155
- Language:
- URL:
- https://aclanthology.org/W19-5418
- DOI:
- 10.18653/v1/W19-5418
- Cite (ACL):
- Casimiro Pio Carrino, Bardia Rafieian, Marta R. Costa-jussà, and José A. R. Fonollosa. 2019. Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task. In Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pages 151–155, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Terminology-Aware Segmentation and Domain Feature for the WMT19 Biomedical Translation Task (Carrino et al., WMT 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W19-5418.pdf