Stacey Bailey

2010

Data Preparation for Machine Translation Customization
Stacey Bailey
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program

The presentation will focus on ongoing work to develop sentence-aligned Chinese-English data for machine translation customization. Fully automatic alignment produces noisy data (e.g., containing OCR and alignment errors), and we are looking at the question of just how noisy noisy data can be and still produce translation improvements. Related, data clean-up efforts are time- and labor-intensive and we are examining whether translation improvements justify the clean-up costs.

Stacey Bailey

2010

2008

Co-authors

Venues