Abstract
This paper describes the University of Maryland statistical machine translation system used in the IWSLT 2007 evaluation. Our focus was threefold: using hierarchical phrase-based models in spoken language translation, the incorporation of sub-lexical information in model estimation via morphological analysis (Arabic) and word and character segmentation (Chinese), and the use of n-gram sequence models for source-side punctuation prediction. Our efforts yield significant improvements in Chinese-English and Arabic-English translation tasks for both spoken language and human transcription conditions.- Anthology ID:
- 2007.iwslt-1.28
- Volume:
- Proceedings of the Fourth International Workshop on Spoken Language Translation
- Month:
- October 15-16
- Year:
- 2007
- Address:
- Trento, Italy
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- Note:
- Pages:
- Language:
- URL:
- https://aclanthology.org/2007.iwslt-1.28
- DOI:
- Cite (ACL):
- Christopher J. Dyer. 2007. The University of Maryland translation system for IWSLT 2007. In Proceedings of the Fourth International Workshop on Spoken Language Translation, Trento, Italy.
- Cite (Informal):
- The University of Maryland translation system for IWSLT 2007 (Dyer, IWSLT 2007)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2007.iwslt-1.28.pdf