Bad to the Bone: Predicting the Impact of Source on MT
Abstract
It’s a well-known truism that poorly written source has a profound negative effect on the quality of machine translation, drastically reduces the productivity of post-editors and impacts turnaround times. But what is bad and how bad is bad? Conversely, what are the features emblematic of good content and how good is good? The impact of source on MT is crucial since a lot of content is written by non-native authors, created by technical specialists for a non-technical audience and may not adhere to brand tone and voice. AI can be employed to identify these errors and predict ‘at-risk’ content prior to localization in a multitude of languages. The presentation will show how source files and even individual sentences within those source files can be analyzed for markers of complexity and readability and thus are more likely to cause mistranslations and omissions for machine translation and subsequent post-editing. Potential solutions will be explored such as rewriting the source to be in line with acceptable threshold criteria for each product and/or domain, re-routing to other machine translation engines better suited for the task at hand and building AI-based predictive models.- Anthology ID:
- 2021.mtsummit-up.14
- Volume:
- Proceedings of Machine Translation Summit XVIII: Users and Providers Track
- Month:
- August
- Year:
- 2021
- Address:
- Virtual
- Editors:
- Janice Campbell, Ben Huyck, Stephen Larocca, Jay Marciano, Konstantin Savenkov, Alex Yanishevsky
- Venue:
- MTSummit
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 175–199
- Language:
- URL:
- https://aclanthology.org/2021.mtsummit-up.14
- DOI:
- Cite (ACL):
- Alex Yanishevsky. 2021. Bad to the Bone: Predicting the Impact of Source on MT. In Proceedings of Machine Translation Summit XVIII: Users and Providers Track, pages 175–199, Virtual. Association for Machine Translation in the Americas.
- Cite (Informal):
- Bad to the Bone: Predicting the Impact of Source on MT (Yanishevsky, MTSummit 2021)