Translation of Biomedical Documents with Focus on Spanish-English

Mirela-Stefania Duma, Wolfgang Menzel


Abstract
For the WMT 2018 shared task of translating documents pertaining to the Biomedical domain, we developed a scoring formula that uses an unsophisticated and effective method of weighting term frequencies and was integrated in a data selection pipeline. The method was applied on five language pairs and it performed best on Portuguese-English, where a BLEU score of 41.84 placed it third out of seven runs submitted by three institutions. In this paper, we describe our method and results with a special focus on Spanish-English where we compare it against a state-of-the-art method. Our contribution to the task lies in introducing a fast, unsupervised method for selecting domain-specific data for training models which obtain good results using only 10% of the general domain data.
Anthology ID:
W18-6444
Volume:
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Month:
October
Year:
2018
Address:
Belgium, Brussels
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
637–643
Language:
URL:
https://aclanthology.org/W18-6444
DOI:
10.18653/v1/W18-6444
Bibkey:
Cite (ACL):
Mirela-Stefania Duma and Wolfgang Menzel. 2018. Translation of Biomedical Documents with Focus on Spanish-English. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 637–643, Belgium, Brussels. Association for Computational Linguistics.
Cite (Informal):
Translation of Biomedical Documents with Focus on Spanish-English (Duma & Menzel, WMT 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W18-6444.pdf