Achieving Domain Specificity in SMT without Overt Siloing

William D. Lewis, Chris Wendt, David Bullock


Abstract
We examine pooling data as a method for improving Statistical Machine Translation (SMT) quality for narrowly defined domains, such as data for a particular company or public entity. By pooling all available data, building large SMT engines, and using domain-specific target language models, we see boosts in quality, and can achieve the generalizability and resiliency of a larger SMT but with the precision of a domain-specific engine.
Anthology ID:
L10-1545
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/791_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
William D. Lewis, Chris Wendt, and David Bullock. 2010. Achieving Domain Specificity in SMT without Overt Siloing. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Achieving Domain Specificity in SMT without Overt Siloing (Lewis et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/791_Paper.pdf