Stacey Bailey


2010

The presentation will focus on ongoing work to develop sentence-aligned Chinese-English data for machine translation customization. Fully automatic alignment produces noisy data (e.g., containing OCR and alignment errors), and we are looking at the question of just how noisy noisy data can be and still produce translation improvements. Related, data clean-up efforts are time- and labor-intensive and we are examining whether translation improvements justify the clean-up costs.

2008