Jennifer Drexler
2014
A Wikipedia-based Corpus for Contextualized Machine Translation
Jennifer Drexler
|
Pushpendre Rastogi
|
Jacqueline Aguilar
|
Benjamin Van Durme
|
Matt Post
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We describe a corpus for target-contextualized machine translation (MT), where the task is to improve the translation of source documents using language models built over presumably related documents in the target language. The idea presumes a situation where most of the information about a topic is in a foreign language, yet some related target-language information is known to exist. Our corpus comprises a set of curated English Wikipedia articles describing news events, along with (i) their Spanish counterparts and (ii) some of the Spanish source articles cited within them. In experiments, we translated these Spanish documents, treating the English articles as target-side context, and evaluate the effect on translation quality when including target-side language models built over this English context and interpolated with other, separately-derived language model data. We find that even under this simplistic baseline approach, we achieve significant improvements as measured by BLEU score.
2012
The MIT-LL/AFRL IWSLT 2012 MT system
Jennifer Drexler
|
Wade Shen
|
Tim Anderson
|
Raymond Slyh
|
Brian Ore
|
Eric Hansen
|
Terry Gleason
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2012 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic to English and English to French TED-talk translation task. We also applied our existing ASR system to the TED-talk lecture ASR task, and combined our ASR and MT systems for the TED-talk SLT task. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2011 system, and experiments we ran during the IWSLT-2012 evaluation. Specifically, we focus on 1) cross-domain translation using MAP adaptation, 2) cross-entropy filtering of MT training data, and 3) improved Arabic morphology for MT preprocessing.
2011
The MIT-LL/AFRL IWSLT-2011 MT system
A. Ryan Aminzadeh
|
Tim Anderson
|
Ray Slyh
|
Brian Ore
|
Eric Hansen
|
Wade Shen
|
Jennifer Drexler
|
Terry Gleason
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes the MIT-LL/AFRL statistical MT system and the improvements that were developed during the IWSLT 2011 evaluation campaign. As part of these efforts, we experimented with a number of extensions to the standard phrase-based model that improve performance on the Arabic to English and English to French TED-talk translation tasks. We also applied our existing ASR system to the TED-talk lecture ASR task. We discuss the architecture of the MIT-LL/AFRL MT system, improvements over our 2010 system, and experiments we ran during the IWSLT-2011 evaluation. Specifically, we focus on 1) speech recognition for lecture-like data, 2) cross-domain translation using MAP adaptation, and 3) improved Arabic morphology for MT preprocessing.
Search
Co-authors
- Tim Anderson 2
- Brian Ore 2
- Eric Hansen 2
- Wade Shen 2
- Terry Gleason 2
- show all...