Machine Translation Human Evaluation: an investigation of evaluation based on Post-Editing and its relation with Direct Assessment

Luisa Bentivogli, Mauro Cettolo, Marcello Federico, Christian Federmann


Abstract
In this paper we present an analysis of the two most prominent methodologies used for the human evaluation of MT quality, namely evaluation based on Post-Editing (PE) and evaluation based on Direct Assessment (DA). To this purpose, we exploit a publicly available large dataset containing both types of evaluations. We first focus on PE and investigate how sensitive TER-based evaluation is to the type and number of references used. Then, we carry out a comparative analysis of PE and DA to investigate the extent to which the evaluation results obtained by methodologies addressing different human perspectives are similar. This comparison sheds light not only on PE but also on the so-called reference bias related to monolingual DA. Also, we analyze if and how the two methodologies can complement each other’s weaknesses.
Anthology ID:
2018.iwslt-1.9
Volume:
Proceedings of the 15th International Conference on Spoken Language Translation
Month:
October 29-30
Year:
2018
Address:
Brussels
Editors:
Marco Turchi, Jan Niehues, Marcello Frederico
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
International Conference on Spoken Language Translation
Note:
Pages:
62–69
Language:
URL:
https://aclanthology.org/2018.iwslt-1.9
DOI:
Bibkey:
Cite (ACL):
Luisa Bentivogli, Mauro Cettolo, Marcello Federico, and Christian Federmann. 2018. Machine Translation Human Evaluation: an investigation of evaluation based on Post-Editing and its relation with Direct Assessment. In Proceedings of the 15th International Conference on Spoken Language Translation, pages 62–69, Brussels. International Conference on Spoken Language Translation.
Cite (Informal):
Machine Translation Human Evaluation: an investigation of evaluation based on Post-Editing and its relation with Direct Assessment (Bentivogli et al., IWSLT 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2018.iwslt-1.9.pdf