Chris Wendt


2017

2016

2013

2012

2010

This paper outlines the methodologies Microsoft has deployed for seamless integration of human translation into the translation workflow, and describes a variety of methods to gather and collect human translation data. Increased amounts of parallel training data help to enhance the translation quality of the statistical MT system in use at Microsoft. The presentation covers the theory, the technical methodology as well as the experiences Microsoft has with the implementation, and practical use of such a system. Included is a discussion of the factors influencing the translation quality of a statistical MT system, a short description of the feedback collection mechanism in use at Microsoft, and the metrics it observed on its MT deployments.
We examine pooling data as a method for improving Statistical Machine Translation (SMT) quality for narrowly defined domains, such as data for a particular company or public entity. By pooling all available data, building large SMT engines, and using domain-specific target language models, we see boosts in quality, and can achieve the generalizability and resiliency of a larger SMT but with the precision of a domain-specific engine.

2009