Using a large monolingual corpus to improve translation accuracy

Radu Soricut, Kevin Knight, Daniel Marcu


Abstract
The existence of a phrase in a large monolingual corpus is very useful information, and so is its frequency. We introduce an alternative approach to automatic translation of phrases/sentences that operationalizes this observation. We use a statistical machine translation system to produce alternative translations and a large monolingual corpus to (re)rank these translations. Our results show that this combination yields better translations, especially when translating out-of-domain phrases/sentences. Our approach can be also used to automatically construct parallel corpora from monolingual resources.
Anthology ID:
2002.amta-papers.16
Volume:
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
October 8-12
Year:
2002
Address:
Tiburon, USA
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
155–164
Language:
URL:
https://link.springer.com/chapter/10.1007/3-540-45820-4_16
DOI:
Bibkey:
Cite (ACL):
Radu Soricut, Kevin Knight, and Daniel Marcu. 2002. Using a large monolingual corpus to improve translation accuracy. In Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 155–164, Tiburon, USA. Springer.
Cite (Informal):
Using a large monolingual corpus to improve translation accuracy (Soricut et al., AMTA 2002)
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/3-540-45820-4_16