Abstract
Document-level (doc-level) human eval-uation of machine translation (MT) has raised interest in the community after a fewattempts have disproved claims of “human parity” (Toral et al., 2018; Laubli et al.,2018). However, little is known about bestpractices regarding doc-level human evalu-ation. The goal of this project is to identifywhich methodologies better cope with i)the current state-of-the-art (SOTA) humanmetrics, ii) a possible complexity when as-signing a single score to a text consisted of‘good’ and ‘bad’ sentences, iii) a possibletiredness bias in doc-level set-ups, and iv)the difference in inter-annotator agreement(IAA) between sentence and doc-level set-ups.- Anthology ID:
- 2020.eamt-1.49
- Volume:
- Proceedings of the 22nd Annual Conference of the European Association for Machine Translation
- Month:
- November
- Year:
- 2020
- Address:
- Lisboa, Portugal
- Venue:
- EAMT
- SIG:
- Publisher:
- European Association for Machine Translation
- Note:
- Pages:
- 455–456
- Language:
- URL:
- https://aclanthology.org/2020.eamt-1.49
- DOI:
- Cite (ACL):
- Sheila Castilho. 2020. Document-Level Machine Translation Evaluation Project: Methodology, Effort and Inter-Annotator Agreement. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 455–456, Lisboa, Portugal. European Association for Machine Translation.
- Cite (Informal):
- Document-Level Machine Translation Evaluation Project: Methodology, Effort and Inter-Annotator Agreement (Castilho, EAMT 2020)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/2020.eamt-1.49.pdf