Wider Pipelines: N-Best Alignments and Parses in MT Training
Ashish Venugopal, Andreas Zollmann, Noah A. Smith, Stephan Vogel
Abstract
State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these “single-best” hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOP-inspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.- Anthology ID:
- 2008.amta-papers.18
- Volume:
- Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
- Month:
- October 21-25
- Year:
- 2008
- Address:
- Waikiki, USA
- Venue:
- AMTA
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 192–201
- Language:
- URL:
- https://aclanthology.org/2008.amta-papers.18
- DOI:
- Cite (ACL):
- Ashish Venugopal, Andreas Zollmann, Noah A. Smith, and Stephan Vogel. 2008. Wider Pipelines: N-Best Alignments and Parses in MT Training. In Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, pages 192–201, Waikiki, USA. Association for Machine Translation in the Americas.
- Cite (Informal):
- Wider Pipelines: N-Best Alignments and Parses in MT Training (Venugopal et al., AMTA 2008)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2008.amta-papers.18.pdf