Abstract
In this paper, we describe Dell EMC’s framework to automatically collect MT-related productivity metrics from a large translation supply chain over an extended period of time, the characteristics and volume of the gathered data, and the insights from analyzing the data to guide our MT strategy. Aligning tools, processes and people required decisions, concessions and contributions from Dell management, technology providers, tool implementors, LSPs and linguists to harvest data at scale over 2+ years while Dell EMC migrated from customized SMT to generic NMT and then customized NMT systems. For content in two quality tiers, we ranked language pairs by productivity, graphed trendlines, compared the time needed to edit machine translations versus fuzzy matches, studied the time spent on segments with no post-edits, and going by the post-edit density, re-viewed segment distribution on a post-edit scale of 1 to 10 and any correlation between the extent of edits and segment length.- Anthology ID:
- 2020.eamt-1.38
- Volume:
- Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Lisboa, Portugal
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 353–362
- Language:
- URL:
- https://aclanthology.org/2020.eamt-1.38
- DOI:
- Cite (ACL):
- Georg Kirchner. 2020. Insights from Gathering MT Productivity Metrics at Scale. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 353–362, Lisboa, Portugal. European Association for Machine Translation.
- Cite (Informal):
- Insights from Gathering MT Productivity Metrics at Scale (Kirchner, EAMT 2020)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2020.eamt-1.38.pdf