Using a large monolingual corpus to improve translation accuracy

Radu Soricut; Kevin Knight; Daniel Marcu

Using a large monolingual corpus to improve translation accuracy

Radu Soricut, Kevin Knight, Daniel Marcu

Abstract

The existence of a phrase in a large monolingual corpus is very useful information, and so is its frequency. We introduce an alternative approach to automatic translation of phrases/sentences that operationalizes this observation. We use a statistical machine translation system to produce alternative translations and a large monolingual corpus to (re)rank these translations. Our results show that this combination yields better translations, especially when translating out-of-domain phrases/sentences. Our approach can be also used to automatically construct parallel corpora from monolingual resources.

Anthology ID:: 2002.amta-papers.16
Volume:: Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:: October 8-12
Year:: 2002
Address:: Tiburon, USA
Editor:: Stephen D. Richardson
Venue:: AMTA
SIG:
Publisher:: Springer
Note:
Pages:: 155–164
Language:
URL:: https://link.springer.com/chapter/10.1007/3-540-45820-4_16
DOI:
Bibkey:
Cite (ACL):: Radu Soricut, Kevin Knight, and Daniel Marcu. 2002. Using a large monolingual corpus to improve translation accuracy. In Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 155–164, Tiburon, USA. Springer.
Cite (Informal):: Using a large monolingual corpus to improve translation accuracy (Soricut et al., AMTA 2002)
Copy Citation:
PDF:: https://link.springer.com/chapter/10.1007/3-540-45820-4_16

PDF Search