On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs
Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann, Tom Kocmi
Abstract
Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments. In this work, we compare human assessment data from the last two WMT evaluation campaigns collected via two different methods for document-level evaluation. Our analysis shows that a document-centric approach to evaluation where the annotator is presented with the entire document context on a screen leads to higher quality segment and document level assessments. It improves the correlation between segment and document scores and increases inter-annotator agreement for document scores but is considerably more time consuming for annotators.- Anthology ID:
- 2021.humeval-1.11
- Volume:
- Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)
- Month:
- April
- Year:
- 2021
- Address:
- Online
- Editors:
- Anya Belz, Shubham Agarwal, Yvette Graham, Ehud Reiter, Anastasia Shimorina
- Venue:
- HumEval
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 97–106
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2021.humeval-1.11/
- DOI:
- Cite (ACL):
- Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann, and Tom Kocmi. 2021. On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), pages 97–106, Online. Association for Computational Linguistics.
- Cite (Informal):
- On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs (Grundkiewicz et al., HumEval 2021)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2021.humeval-1.11.pdf