Translating Structured Documents

George Foster, Pierre Isabelle, Roland Kuhn


Abstract
Machine Translation traditionally treats documents as sets of independent sentences. In many genres, however, documents are highly structured, and their structure contains information that can be used to improve translation quality. We present a preliminary approach to document translation that uses structural features to modify the behaviour of a language model, at sentence-level granularity. To our knowledge, this is the first attempt to incorporate structural information into statistical MT. In experiments on structured English/French documents from the Hansard corpus, we demonstrate small but statistically significant improvements.
Anthology ID:
2010.amta-papers.24
Volume:
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 31-November 4
Year:
2010
Address:
Denver, Colorado, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2010.amta-papers.24
DOI:
Bibkey:
Cite (ACL):
George Foster, Pierre Isabelle, and Roland Kuhn. 2010. Translating Structured Documents. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers, Denver, Colorado, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Translating Structured Documents (Foster et al., AMTA 2010)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2010.amta-papers.24.pdf