Abstract
For the WMT 2018 shared task of translating documents pertaining to the Biomedical domain, we developed a scoring formula that uses an unsophisticated and effective method of weighting term frequencies and was integrated in a data selection pipeline. The method was applied on five language pairs and it performed best on Portuguese-English, where a BLEU score of 41.84 placed it third out of seven runs submitted by three institutions. In this paper, we describe our method and results with a special focus on Spanish-English where we compare it against a state-of-the-art method. Our contribution to the task lies in introducing a fast, unsupervised method for selecting domain-specific data for training models which obtain good results using only 10% of the general domain data.- Anthology ID:
- W18-6444
- Volume:
- Proceedings of the Third Conference on Machine Translation: Shared Task Papers
- Month:
- October
- Year:
- 2018
- Address:
- Belgium, Brussels
- Editors:
- Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 637–643
- Language:
- URL:
- https://aclanthology.org/W18-6444
- DOI:
- 10.18653/v1/W18-6444
- Cite (ACL):
- Mirela-Stefania Duma and Wolfgang Menzel. 2018. Translation of Biomedical Documents with Focus on Spanish-English. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 637–643, Belgium, Brussels. Association for Computational Linguistics.
- Cite (Informal):
- Translation of Biomedical Documents with Focus on Spanish-English (Duma & Menzel, WMT 2018)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/W18-6444.pdf