Translation of Biomedical Documents with Focus on Spanish-English

Mirela-Stefania Duma; Wolfgang Menzel

doi:10.18653/v1/W18-6444

Translation of Biomedical Documents with Focus on Spanish-English

Abstract

For the WMT 2018 shared task of translating documents pertaining to the Biomedical domain, we developed a scoring formula that uses an unsophisticated and effective method of weighting term frequencies and was integrated in a data selection pipeline. The method was applied on five language pairs and it performed best on Portuguese-English, where a BLEU score of 41.84 placed it third out of seven runs submitted by three institutions. In this paper, we describe our method and results with a special focus on Spanish-English where we compare it against a state-of-the-art method. Our contribution to the task lies in introducing a fast, unsupervised method for selecting domain-specific data for training models which obtain good results using only 10% of the general domain data.

Anthology ID:: W18-6444
Volume:: Proceedings of the Third Conference on Machine Translation: Shared Task Papers
Month:: October
Year:: 2018
Address:: Belgium, Brussels
Venues:: EMNLP | WMT | WS
SIG:: SIGMT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 637–643
Language:
URL:: https://aclanthology.org/W18-6444
DOI:: 10.18653/v1/W18-6444
Bibkey:
Cite (ACL):: Mirela-Stefania Duma and Wolfgang Menzel. 2018. Translation of Biomedical Documents with Focus on Spanish-English. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 637–643, Belgium, Brussels. Association for Computational Linguistics.
Cite (Informal):: Translation of Biomedical Documents with Focus on Spanish-English (Duma & Menzel, 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/update-css-js/W18-6444.pdf

PDF Cite Search