Translation of Biomedical Documents with Focus on Spanish-English

Mirela-Stefania Duma, Wolfgang Menzel


Abstract
For the WMT 2018 shared task of translating documents pertaining to the Biomedical domain, we developed a scoring formula that uses an unsophisticated and effective method of weighting term frequencies and was integrated in a data selection pipeline. The method was applied on five language pairs and it performed best on Portuguese-English, where a BLEU score of 41.84 placed it third out of seven runs submitted by three institutions. In this paper, we describe our method and results with a special focus on Spanish-English where we compare it against a state-of-the-art method. Our contribution to the task lies in introducing a fast, unsupervised method for selecting domain-specific data for training models which obtain good results using only 10% of the general domain data.
Anthology ID:
W18-6444
Volume:
Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Month:
October
Year:
2018
Address:
Belgium, Brussels
Editors:
Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
Venue:
WMT
SIG:
SIGMT
Publisher:
Association for Computational Linguistics
Note:
Pages:
637–643
Language:
URL:
https://aclanthology.org/W18-6444
DOI:
10.18653/v1/W18-6444
Bibkey:
Cite (ACL):
Mirela-Stefania Duma and Wolfgang Menzel. 2018. Translation of Biomedical Documents with Focus on Spanish-English. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 637–643, Belgium, Brussels. Association for Computational Linguistics.
Cite (Informal):
Translation of Biomedical Documents with Focus on Spanish-English (Duma & Menzel, WMT 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/W18-6444.pdf