ReproHum #0043-4: Evaluating Summarization Models: investigating the impact of education and language proficiency on reproducibility
Mateusz Lango, Patricia Schmidtova, Simone Balloccu, Ondrej Dusek
Abstract
In this paper, we describe several reproductions of a human evaluation experiment measuring the quality of automatic dialogue summarization (Feng et al., 2021). We investigate the impact of the annotators’ highest level of education, field of study, and native language on the evaluation of the informativeness of the summary. We find that the evaluation is relatively consistent regardless of these factors, but the biggest impact seems to be a prior specific background in natural language processing (as opposed to, e.g. a background in computer sci- ence). We also find that the experiment setup (asking for single vs. multiple criteria) may have an impact on the results.- Anthology ID:
- 2024.humeval-1.20
- Volume:
- Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
- Venues:
- HumEval | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 229–237
- Language:
- URL:
- https://aclanthology.org/2024.humeval-1.20
- DOI:
- Cite (ACL):
- Mateusz Lango, Patricia Schmidtova, Simone Balloccu, and Ondrej Dusek. 2024. ReproHum #0043-4: Evaluating Summarization Models: investigating the impact of education and language proficiency on reproducibility. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 229–237, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- ReproHum #0043-4: Evaluating Summarization Models: investigating the impact of education and language proficiency on reproducibility (Lango et al., HumEval-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.humeval-1.20.pdf