Abstract
Document-level human evaluation of machine translation (MT) has been raising interest in the community. However, little is known about the issues of using document-level methodologies to assess MT quality. In this article, we compare the inter-annotator agreement (IAA) scores, the effort to assess the quality in different document-level methodologies, and the issue of misevaluation when sentences are evaluated out of context.- Anthology ID:
- 2021.humeval-1.4
- Volume:
- Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)
- Month:
- April
- Year:
- 2021
- Address:
- Online
- Editors:
- Anya Belz, Shubham Agarwal, Yvette Graham, Ehud Reiter, Anastasia Shimorina
- Venue:
- HumEval
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 34–45
- Language:
- URL:
- https://aclanthology.org/2021.humeval-1.4
- DOI:
- Cite (ACL):
- Sheila Castilho. 2021. Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation. In Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), pages 34–45, Online. Association for Computational Linguistics.
- Cite (Informal):
- Towards Document-Level Human MT Evaluation: On the Issues of Annotator Agreement, Effort and Misevaluation (Castilho, HumEval 2021)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2021.humeval-1.4.pdf