Syntax-based language models for statistical machine translation

Eugene Charniak; Kevin Knight; Kenji Yamada

Syntax-based language models for statistical machine translation

Eugene Charniak, Kevin Knight, Kenji Yamada

Abstract

We present a syntax-based language model for use in noisy-channel machine translation. In particular, a language model based upon that described in (Cha01) is combined with the syntax based translation-model described in (YK01). The resulting system was used to translate 347 sentences from Chinese to English and compared with the results of an IBM-model-4-based system, as well as that of (YK02), all trained on the same data. The translations were sorted into four groups: good/bad syntax crossed with good/bad meaning. While the total number of translations that preserved meaning were the same for (YK02) and the syntax-based system (and both higher than the IBM-model-4-based system), the syntax based system had 45% more translations that also had good syntax than did (YK02) (and approximately 70% more than IBM Model 4). The number of translations that did not preserve meaning, but at least had good grammar, also increased, though to less avail.

Anthology ID:: 2003.mtsummit-papers.6
Volume:: Proceedings of Machine Translation Summit IX: Papers
Month:: September 23-27
Year:: 2003
Address:: New Orleans, USA
Venue:: MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:: https://aclanthology.org/2003.mtsummit-papers.6
DOI:
Bibkey:
Cite (ACL):: Eugene Charniak, Kevin Knight, and Kenji Yamada. 2003. Syntax-based language models for statistical machine translation. In Proceedings of Machine Translation Summit IX: Papers, New Orleans, USA.
Cite (Informal):: Syntax-based language models for statistical machine translation (Charniak et al., MTSummit 2003)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-dup-bibkey/2003.mtsummit-papers.6.pdf

PDF Search