Fill-up versus interpolation methods for phrase-based SMT adaptation

Arianna Bisazza; Nick Ruiz; Marcello Federico

Fill-up versus interpolation methods for phrase-based SMT adaptation

Arianna Bisazza, Nick Ruiz, Marcello Federico

Abstract

This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system training. We address a common scenario where little in-domain data is available for the task, but where large background models exist for the same language pair. In particular, we focus on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus. We present experiments on an emerging transcribed speech translation task – the TED talks. While performing similarly in terms of BLEU and NIST scores to the popular log-linear and linear interpolation techniques, filled-up translation models are more compact and easy to tune by minimum error training.

Anthology ID:: 2011.iwslt-evaluation.18
Volume:: Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
Month:: December 8-9
Year:: 2011
Address:: San Francisco, California
Editors:: Marcello Federico, Mei-Yuh Hwang, Margit Rödder, Sebastian Stüker
Venue:: IWSLT
SIG:: SIGSLT
Publisher:
Note:
Pages:: 136–143
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2011.iwslt-evaluation.18/
DOI:
Bibkey:
Cite (ACL):: Arianna Bisazza, Nick Ruiz, and Marcello Federico. 2011. Fill-up versus interpolation methods for phrase-based SMT adaptation. In Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign, pages 136–143, San Francisco, California.
Cite (Informal):: Fill-up versus interpolation methods for phrase-based SMT adaptation (Bisazza et al., IWSLT 2011)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2011.iwslt-evaluation.18.pdf

PDF Cite Search Fix data