Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System
Rudali Huidrom, Ondřej Dušek, Zdeněk Kasner, Thiago Castro Ferreira, Anya Belz
Abstract
In this paper, we present the results of two re- production studies for the human evaluation originally reported by Dušek and Kasner (2020) in which the authors comparatively evaluated outputs produced by a semantic error detection system for data-to-text generation against ref- erence outputs. In the first reproduction, the original evaluators repeat the evaluation, in a test of the repeatability of the original evalua- tion. In the second study, two new evaluators carry out the evaluation task, in a test of the reproducibility of the original evaluation under otherwise identical conditions. We describe our approach to reproduction, and present and analyse results, finding different degrees of re- producibility depending on result type, data and labelling task. Our resources are available and open-sourced.- Anthology ID:
- 2022.inlg-genchal.9
- Volume:
- Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges
- Month:
- July
- Year:
- 2022
- Address:
- Waterville, Maine, USA and virtual meeting
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 52–61
- Language:
- URL:
- https://aclanthology.org/2022.inlg-genchal.9
- DOI:
- Cite (ACL):
- Rudali Huidrom, Ondřej Dušek, Zdeněk Kasner, Thiago Castro Ferreira, and Anya Belz. 2022. Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System. In Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges, pages 52–61, Waterville, Maine, USA and virtual meeting. Association for Computational Linguistics.
- Cite (Informal):
- Two Reproductions of a Human-Assessed Comparative Evaluation of a Semantic Error Detection System (Huidrom et al., INLG 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.inlg-genchal.9.pdf