Refining Word Alignment with Discriminative Training

Nadi Tomeh, Alexandre Allauzen, François Yvon, Guillaume Wisniewski


Abstract
The quality of statistical machine translation systems depends on the quality of the word alignments that are computed during the translation model training phase. IBM alignment models, as implemented in the GIZA++ toolkit, constitute the de facto standard for performing these computations. The resulting alignments and translation models are however very noisy, and several authors have tried to improve them. In this work, we propose a simple and effective approach, which considers alignment as a series of independent binary classification problems in the alignment matrix. Through extensive feature engineering and the use of stacking techniques, we were able to obtain alignments much closer to manually defined references than those obtained by the IBM models. These alignments also yield better translation models, delivering improved performance in a large scale Arabic to English translation task.
Anthology ID:
2010.amta-papers.18
Volume:
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers
Month:
October 31-November 4
Year:
2010
Address:
Denver, Colorado, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2010.amta-papers.18/
DOI:
Bibkey:
Cite (ACL):
Nadi Tomeh, Alexandre Allauzen, François Yvon, and Guillaume Wisniewski. 2010. Refining Word Alignment with Discriminative Training. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers, Denver, Colorado, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Refining Word Alignment with Discriminative Training (Tomeh et al., AMTA 2010)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2010.amta-papers.18.pdf