Improving Syntax-Driven Translation Models by Re-structuring Divergent and Nonisomorphic Parse Tree Structures

Vamshi Ambati, Alon Lavie


Abstract
Syntax-based approaches to statistical MT require syntax-aware methods for acquiring their underlying translation models from parallel data. This acquisition process can be driven by syntactic trees for either the source or target language, or by trees on both sides. Work to date has demonstrated that using trees for both sides suffers from severe coverage problems. This is primarily due to the highly restrictive space of constituent segmentations that the trees on two sides introduce, which adversely affects the recall of the resulting translation models. Approaches that project from trees on one side, on the other hand, have higher levels of recall, but suffer from lower precision, due to the lack of syntactically-aware word alignments. In this paper we explore the issue of lexical coverage of the translation models learned in both of these scenarios. We specifically look at how the non-isomorphic nature of the parse trees for the two languages affects recall and coverage. We then propose a novel technique for restructuring target parse trees, that generates highly isomorphic target trees that preserve the syntactic boundaries of constituents that were aligned in the original parse trees. We evaluate the translation models learned from these restructured trees and show that they are significantly better than those learned using trees on both sides and trees on one side.
Anthology ID:
2008.amta-srw.1
Volume:
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Student Research Workshop
Month:
October 21-25
Year:
2008
Address:
Waikiki, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
235–244
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2008.amta-srw.1/
DOI:
Bibkey:
Cite (ACL):
Vamshi Ambati and Alon Lavie. 2008. Improving Syntax-Driven Translation Models by Re-structuring Divergent and Nonisomorphic Parse Tree Structures. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Student Research Workshop, pages 235–244, Waikiki, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Improving Syntax-Driven Translation Models by Re-structuring Divergent and Nonisomorphic Parse Tree Structures (Ambati & Lavie, AMTA 2008)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2008.amta-srw.1.pdf