Abstract
This paper presents a method for exploiting document-level similarity between the documents in the training corpus for a corpus-driven (statistical or example-based) machine translation system and the input documents it must translate. The method is simple to implement, efficient (increases the translation time of an example-based system by only a few percent), and robust (still works even when the actual document boundaries in the input text are not known). Experiments on French-English and Arabic-English showed relative gains over the same system without using document-level similarity of up to 7.4% and 5.4%, respectively, on the BLEU metric.- Anthology ID:
- 2008.amta-papers.2
- Volume:
- Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
- Month:
- October 21-25
- Year:
- 2008
- Address:
- Waikiki, USA
- Venue:
- AMTA
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 46–55
- Language:
- URL:
- https://aclanthology.org/2008.amta-papers.2
- DOI:
- Cite (ACL):
- Ralf Brown. 2008. Exploiting Document-Level Context for Data-Driven Machine Translation. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 46–55, Waikiki, USA. Association for Machine Translation in the Americas.
- Cite (Informal):
- Exploiting Document-Level Context for Data-Driven Machine Translation (Brown, AMTA 2008)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/2008.amta-papers.2.pdf