QE4PE: Word-level Quality Estimation for Human Post-Editing
Gabriele Sarti, Vilém Zouhar, Grzegorz Chrupała, Ana Guerberof-Arenas, Malvina Nissim, Arianna Bisazza
Abstract
Word-level quality estimation (QE) methods aim to detect erroneous spans in machine translations, which can direct and facilitate human post-editing. While the accuracy of word-level QE systems has been assessed extensively, their usability and downstream influence on the speed, quality, and editing choices of human post-editing remain understudied. In this study, we investigate the impact of word-level QE on machine translation (MT) post-editing in a realistic setting involving 42 professional post-editors across two translation directions. We compare four error-span highlight modalities, including supervised and uncertainty-based word-level QE methods, for identifying potential errors in the outputs of a state-of-the-art neural MT model. Post-editing effort and productivity are estimated from behavioral logs, while quality improvements are assessed by word- and segment-level human annotation. We find that domain, language and editors’ speed are critical factors in determining highlights’ effectiveness, with modest differences between human-made and automated QE highlights underlining a gap between accuracy and usability in professional workflows.- Anthology ID:
- 2025.tacl-1.64
- Volume:
- Transactions of the Association for Computational Linguistics, Volume 13
- Month:
- Year:
- 2025
- Address:
- Cambridge, MA
- Venue:
- TACL
- SIG:
- Publisher:
- MIT Press
- Note:
- Pages:
- 1410–1435
- Language:
- URL:
- https://preview.aclanthology.org/fix-opsupmap-display/2025.tacl-1.64/
- DOI:
- 10.1162/tacl.a.46
- Cite (ACL):
- Gabriele Sarti, Vilém Zouhar, Grzegorz Chrupała, Ana Guerberof-Arenas, Malvina Nissim, and Arianna Bisazza. 2025. QE4PE: Word-level Quality Estimation for Human Post-Editing. Transactions of the Association for Computational Linguistics, 13:1410–1435.
- Cite (Informal):
- QE4PE: Word-level Quality Estimation for Human Post-Editing (Sarti et al., TACL 2025)
- PDF:
- https://preview.aclanthology.org/fix-opsupmap-display/2025.tacl-1.64.pdf