Some lessons learned reproducing human evaluation of a data-to-text system

Javier González Corbelle, Jose Alonso, Alberto Bugarín-Diz


Abstract
This paper presents a human evaluation reproduction study regarding the data-to-text generation task. The evaluation focuses in counting the supported and contradicting facts generated by a neural data-to-text model with a macro planning stage. The model is tested generating sport summaries for the ROTOWIRE dataset. We first describe the approach to reproduction that is agreed in the context of the ReproHum project. Then, we detail the entire configuration of the original human evaluation and the adaptations that had to be made to reproduce such an evaluation. Finally, we compare the reproduction results with those reported in the paper that was taken as reference.
Anthology ID:
2023.humeval-1.5
Volume:
Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Anya Belz, Maja Popović, Ehud Reiter, Craig Thomson, João Sedoc
Venues:
HumEval | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
49–68
Language:
URL:
https://aclanthology.org/2023.humeval-1.5
DOI:
Bibkey:
Cite (ACL):
Javier González Corbelle, Jose Alonso, and Alberto Bugarín-Diz. 2023. Some lessons learned reproducing human evaluation of a data-to-text system. In Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems, pages 49–68, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Some lessons learned reproducing human evaluation of a data-to-text system (González Corbelle et al., HumEval-WS 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.humeval-1.5.pdf