MDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation

Nick Ruiz, Marcello Federico


Abstract
This paper provides a fast alternative to Minimum Discrimination Information-based language model adaptation for statistical machine translation. We provide an alternative to computing a normalization term that requires computing full model probabilities (including back-off probabilities) for all n-grams. Rather than re-estimating an entire language model, our Lazy MDI approach leverages a smoothed unigram ratio between an adaptation text and the background language model to scale only the n-gram probabilities corresponding to translation options gathered by the SMT decoder. The effects of the unigram ratio are scaled by adding an additional feature weight to the log-linear discriminative model. We present results on the IWSLT 2012 TED talk translation task and show that Lazy MDI provides comparable language model adaptation performance to classic MDI.
Anthology ID:
2012.iwslt-papers.14
Volume:
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
Month:
December 6-7
Year:
2012
Address:
Hong Kong, Table of contents
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
244–251
Language:
URL:
https://aclanthology.org/2012.iwslt-papers.14
DOI:
Bibkey:
Cite (ACL):
Nick Ruiz and Marcello Federico. 2012. MDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation. In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, pages 244–251, Hong Kong, Table of contents.
Cite (Informal):
MDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation (Ruiz & Federico, IWSLT 2012)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2012.iwslt-papers.14.pdf