Data Preparation for Machine Translation Customization

Stacey Bailey


Abstract
The presentation will focus on ongoing work to develop sentence-aligned Chinese-English data for machine translation customization. Fully automatic alignment produces noisy data (e.g., containing OCR and alignment errors), and we are looking at the question of just how noisy noisy data can be and still produce translation improvements. Related, data clean-up efforts are time- and labor-intensive and we are examining whether translation improvements justify the clean-up costs.
Anthology ID:
2010.amta-government.14
Volume:
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program
Month:
October 31-November 4
Year:
2010
Address:
Denver, Colorado, USA
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2010.amta-government.14
DOI:
Bibkey:
Cite (ACL):
Stacey Bailey. 2010. Data Preparation for Machine Translation Customization. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program, Denver, Colorado, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Data Preparation for Machine Translation Customization (Bailey, AMTA 2010)
Copy Citation: