Abstract
This paper provides a fast alternative to Minimum Discrimination Information-based language model adaptation for statistical machine translation. We provide an alternative to computing a normalization term that requires computing full model probabilities (including back-off probabilities) for all n-grams. Rather than re-estimating an entire language model, our Lazy MDI approach leverages a smoothed unigram ratio between an adaptation text and the background language model to scale only the n-gram probabilities corresponding to translation options gathered by the SMT decoder. The effects of the unigram ratio are scaled by adding an additional feature weight to the log-linear discriminative model. We present results on the IWSLT 2012 TED talk translation task and show that Lazy MDI provides comparable language model adaptation performance to classic MDI.- Anthology ID:
- 2012.iwslt-papers.14
- Volume:
- Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
- Month:
- December 6-7
- Year:
- 2012
- Address:
- Hong Kong, Table of contents
- Venue:
- IWSLT
- SIG:
- SIGSLT
- Publisher:
- Note:
- Pages:
- 244–251
- Language:
- URL:
- https://aclanthology.org/2012.iwslt-papers.14
- DOI:
- Cite (ACL):
- Nick Ruiz and Marcello Federico. 2012. MDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation. In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, pages 244–251, Hong Kong, Table of contents.
- Cite (Informal):
- MDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation (Ruiz & Federico, IWSLT 2012)
- PDF:
- https://preview.aclanthology.org/bionlp-24-ingestion/2012.iwslt-papers.14.pdf