Abstract
This paper describes a reproduction of a human evaluation study evaluating redundancies generated in automatically generated text from a data-to-text system. While the scope of the original study is broader, a human evaluation—a manual error analysis—is included as part of the system evaluation. We attempt a reproduction of this human evaluation, however while the authors annotate multiple properties of the generated text, we focus exclusively on a single quality criterion, that of redundancy. In focusing our study on a single minimal reproducible experimental unit, with the experiment being fairly straightforward and all data made available by the authors, we encountered no challenges with our reproduction and were able to reproduce the trend found in the original experiment. However, while still confirming the general trend, we found that both our annotators identified twice as many errors in the dataset than the original authors.- Anthology ID:
- 2024.humeval-1.16
- Volume:
- Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
- Venues:
- HumEval | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 163–198
- Language:
- URL:
- https://aclanthology.org/2024.humeval-1.16
- DOI:
- Cite (ACL):
- Filip Klubička and John D. Kelleher. 2024. ReproHum #1018-09: Reproducing Human Evaluations of Redundancy Errors in Data-To-Text Systems. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 163–198, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- ReproHum #1018-09: Reproducing Human Evaluations of Redundancy Errors in Data-To-Text Systems (Klubička & Kelleher, HumEval-WS 2024)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2024.humeval-1.16.pdf