Abstract
Phrase alignment is a crucial step in phrase-based statistical machine translation. We explore a way of improving phrase alignment by adding syntactic information in the form of chunks as soft constraints guided by an in-depth and detailed analysis on a hand-aligned data set. We extend a probabilistic phrase alignment model that extracts phrase pairs by optimizing phrase pair boundaries over the sentence pair [1]. The boundaries of the target phrase are chosen such that the overall sentence alignment probability is optimal. Viterbi alignment information is also added in the extended model with a view of improving phrase alignment. We extract phrase pairs using a relatively larger number of features which are discriminatively trained using a large-margin online learning algorithm, i.e., Margin Infused Relaxed Algorithm (MIRA) and integrate it in our approach. Initial experiments show improvements in both phrase alignment and translation quality for Arabic-English on a moderate-size translation task.- Anthology ID:
- 2011.iwslt-evaluation.23
- Volume:
- Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
- Month:
- December 8-9
- Year:
- 2011
- Address:
- San Francisco, California
- Editors:
- Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- Note:
- Pages:
- 175–182
- Language:
- URL:
- https://aclanthology.org/2011.iwslt-evaluation.23
- DOI:
- Cite (ACL):
- Mridul Gupta, Sanjika Hewavitharana, and Stephan Vogel. 2011. Extending a probabilistic phrase alignment approach for SMT. In Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign, pages 175–182, San Francisco, California.
- Cite (Informal):
- Extending a probabilistic phrase alignment approach for SMT (Gupta et al., IWSLT 2011)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2011.iwslt-evaluation.23.pdf