Nick Ruiz


2012

pdf
MDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation
Nick Ruiz | Marcello Federico
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

This paper provides a fast alternative to Minimum Discrimination Information-based language model adaptation for statistical machine translation. We provide an alternative to computing a normalization term that requires computing full model probabilities (including back-off probabilities) for all n-grams. Rather than re-estimating an entire language model, our Lazy MDI approach leverages a smoothed unigram ratio between an adaptation text and the background language model to scale only the n-gram probabilities corresponding to translation options gathered by the SMT decoder. The effects of the unigram ratio are scaled by adding an additional feature weight to the log-linear discriminative model. We present results on the IWSLT 2012 TED talk translation task and show that Lazy MDI provides comparable language model adaptation performance to classic MDI.

2011

pdf
Fill-up versus interpolation methods for phrase-based SMT adaptation
Arianna Bisazza | Nick Ruiz | Marcello Federico
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system training. We address a common scenario where little in-domain data is available for the task, but where large background models exist for the same language pair. In particular, we focus on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus. We present experiments on an emerging transcribed speech translation task – the TED talks. While performing similarly in terms of BLEU and NIST scores to the popular log-linear and linear interpolation techniques, filled-up translation models are more compact and easy to tune by minimum error training.

pdf
Topic Adaptation for Lecture Translation through Bilingual Latent Semantic Models
Nick Ruiz | Marcello Federico
Proceedings of the Sixth Workshop on Statistical Machine Translation