Abstract
This paper reports a reproduction study of the human evaluation of role-oriented dialogue summarization models, as part of the ReproNLP Shared Task 2023 on Reproducibility of Evaluations in NLP. We outline the disparities between the original study’s experimental design and our reproduction study, along with the outcomes obtained. The inter-annotator agreement within the reproduction study is observed to be lower, measuring 0.40 as compared to the original study’s 0.48. Among the six conclusions drawn in the original study, four are validated in our reproduction study. We confirm the effectiveness of the proposed approach on the overall metric, albeit with slightly poorer relative performance compared to the original study. Furthermore, we raise an open-ended inquiry: how can subjective practices in the original study be identified and addressed when conducting reproduction studies?- Anthology ID:
- 2023.humeval-1.10
- Volume:
- Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems
- Month:
- September
- Year:
- 2023
- Address:
- Varna, Bulgaria
- Editors:
- Anya Belz, Maja Popović, Ehud Reiter, Craig Thomson, João Sedoc
- Venues:
- HumEval | WS
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 124–129
- Language:
- URL:
- https://aclanthology.org/2023.humeval-1.10
- DOI:
- Cite (ACL):
- Mingqi Gao, Jie Ruan, and Xiaojun Wan. 2023. A Reproduction Study of the Human Evaluation of Role-Oriented Dialogue Summarization Models. In Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems, pages 124–129, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- A Reproduction Study of the Human Evaluation of Role-Oriented Dialogue Summarization Models (Gao et al., HumEval-WS 2023)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2023.humeval-1.10.pdf