Abstract
For the WMT 2018 shared task of translating documents pertaining to the Biomedical domain, we developed a scoring formula that uses an unsophisticated and effective method of weighting term frequencies and was integrated in a data selection pipeline. The method was applied on five language pairs and it performed best on Portuguese-English, where a BLEU score of 41.84 placed it third out of seven runs submitted by three institutions. In this paper, we describe our method and results with a special focus on Spanish-English where we compare it against a state-of-the-art method. Our contribution to the task lies in introducing a fast, unsupervised method for selecting domain-specific data for training models which obtain good results using only 10% of the general domain data.- Anthology ID:
- W18-6444
- Volume:
- Proceedings of the Third Conference on Machine Translation: Shared Task Papers
- Month:
- October
- Year:
- 2018
- Address:
- Belgium, Brussels
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 637–643
- Language:
- URL:
- https://aclanthology.org/W18-6444
- DOI:
- 10.18653/v1/W18-6444
- Cite (ACL):
- Mirela-Stefania Duma and Wolfgang Menzel. 2018. Translation of Biomedical Documents with Focus on Spanish-English. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, pages 637–643, Belgium, Brussels. Association for Computational Linguistics.
- Cite (Informal):
- Translation of Biomedical Documents with Focus on Spanish-English (Duma & Menzel, WMT 2018)
- PDF:
- https://preview.aclanthology.org/author-url/W18-6444.pdf