Abstract
This paper presents a human evaluation reproduction study regarding the data-to-text generation task. The evaluation focuses in counting the supported and contradicting facts generated by a neural data-to-text model with a macro planning stage. The model is tested generating sport summaries for the ROTOWIRE dataset. We first describe the approach to reproduction that is agreed in the context of the ReproHum project. Then, we detail the entire configuration of the original human evaluation and the adaptations that had to be made to reproduce such an evaluation. Finally, we compare the reproduction results with those reported in the paper that was taken as reference.- Anthology ID:
- 2023.humeval-1.5
- Volume:
- Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems
- Month:
- September
- Year:
- 2023
- Address:
- Varna, Bulgaria
- Editors:
- Anya Belz, Maja Popović, Ehud Reiter, Craig Thomson, João Sedoc
- Venues:
- HumEval | WS
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 49–68
- Language:
- URL:
- https://preview.aclanthology.org/remove-affiliations/2023.humeval-1.5/
- DOI:
- Cite (ACL):
- Javier González Corbelle, Jose Alonso, and Alberto Bugarín-Diz. 2023. Some lessons learned reproducing human evaluation of a data-to-text system. In Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems, pages 49–68, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- Some lessons learned reproducing human evaluation of a data-to-text system (González Corbelle et al., HumEval 2023)
- PDF:
- https://preview.aclanthology.org/remove-affiliations/2023.humeval-1.5.pdf