Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System

Rudali Huidrom, Ondřej Dušek, Zdeněk Kasner, Thiago Castro Ferreira, Anya Belz


Abstract
In this paper, we present the results of two re- production studies for the human evaluation originally reported by Dušek and Kasner (2020) in which the authors comparatively evaluated outputs produced by a semantic error detection system for data-to-text generation against ref- erence outputs. In the first reproduction, the original evaluators repeat the evaluation, in a test of the repeatability of the original evalua- tion. In the second study, two new evaluators carry out the evaluation task, in a test of the reproducibility of the original evaluation under otherwise identical conditions. We describe our approach to reproduction, and present and analyse results, finding different degrees of re- producibility depending on result type, data and labelling task. Our resources are available and open-sourced.
Anthology ID:
2022.inlg-genchal.9
Volume:
Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges
Month:
July
Year:
2022
Address:
Waterville, Maine, USA and virtual meeting
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–61
Language:
URL:
https://aclanthology.org/2022.inlg-genchal.9
DOI:
Bibkey:
Cite (ACL):
Rudali Huidrom, Ondřej Dušek, Zdeněk Kasner, Thiago Castro Ferreira, and Anya Belz. 2022. Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System. In Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges, pages 52–61, Waterville, Maine, USA and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System (Huidrom et al., INLG 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.inlg-genchal.9.pdf