Human Evaluation Reproduction Report for Data-to-text Generation with Macro Planning

Mohammad Arvan, Natalie Parde


Abstract
This paper presents a partial reproduction study of Data-to-text Generation with Macro Planning by Puduppully et al. (2021). This work was conducted as part of the ReproHum project, a multi-lab effort to reproduce the results of NLP papers incorporating human evaluations. We follow the same instructions provided by the authors and the ReproHum team to the best of our abilities. We collect preference ratings for the following evaluation criteria in order: conciseness, coherence, and grammaticality. Our results are highly correlated with the original experiment. Nonetheless, we believe the presented results are insufficent to conclude that the Macro system proposed and developed by the original paper is superior compared to other systems. We suspect combining our results with the three other reproductions of this paper through the ReproHum project will paint a clearer picture. Overall, we hope that our work is a step towards a more transparent and reproducible research landscape.
Anthology ID:
2023.humeval-1.8
Volume:
Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Anya Belz, Maja Popović, Ehud Reiter, Craig Thomson, João Sedoc
Venues:
HumEval | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
89–96
Language:
URL:
https://aclanthology.org/2023.humeval-1.8
DOI:
Bibkey:
Cite (ACL):
Mohammad Arvan and Natalie Parde. 2023. Human Evaluation Reproduction Report for Data-to-text Generation with Macro Planning. In Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems, pages 89–96, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Human Evaluation Reproduction Report for Data-to-text Generation with Macro Planning (Arvan & Parde, HumEval-WS 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2023.humeval-1.8.pdf