Exploiting Document-Level Context for Data-Driven Machine Translation

Ralf D. Brown

Exploiting Document-Level Context for Data-Driven Machine Translation

Abstract

This paper presents a method for exploiting document-level similarity between the documents in the training corpus for a corpus-driven (statistical or example-based) machine translation system and the input documents it must translate. The method is simple to implement, efficient (increases the translation time of an example-based system by only a few percent), and robust (still works even when the actual document boundaries in the input text are not known). Experiments on French-English and Arabic-English showed relative gains over the same system without using document-level similarity of up to 7.4% and 5.4%, respectively, on the BLEU metric.

Anthology ID:: 2008.amta-papers.2
Volume:: Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:: October 21-25
Year:: 2008
Address:: Waikiki, USA
Venue:: AMTA
SIG:
Publisher:: Association for Machine Translation in the Americas
Note:
Pages:: 46–55
Language:
URL:: https://aclanthology.org/2008.amta-papers.2
DOI:
Bibkey:
Cite (ACL):: Ralf Brown. 2008. Exploiting Document-Level Context for Data-Driven Machine Translation. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 46–55, Waikiki, USA. Association for Machine Translation in the Americas.
Cite (Informal):: Exploiting Document-Level Context for Data-Driven Machine Translation (Brown, AMTA 2008)
Copy Citation:
PDF:: https://preview.aclanthology.org/auto-file-uploads/2008.amta-papers.2.pdf

PDF Search