ReproHum #0031–01: Reproducing a Human Readability Evaluation for Question–Answer Generation Systems

Manuela Hürlimann, Mark Cieliebak


Abstract
Human evaluations play a central role in assessing natural language processing systems, yet their robustness and reproducibility remain incompletely understood. This paper reports on a reproduction of the human readability evaluation from Yao et al. (2022) for question–answer generation (QAG) systems, conducted within the ReproHum project and the ReproNLP 2026 shared task (Belz et al., 2026). The original evaluation compared three QAG systems with respect to three criteria. We reproduced the evaluation of one of these criteria, readability, using a new group of five evaluators. We report descriptive results, inter-annotator agreement, system-level comparisons, and cross-study robustness metrics compared to the original study and two previous reproductions. Our results support all conclusions of the original evaluation and are largely consistent with two previous reproductions.
Anthology ID:
2026.gem-main.88
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1111–1116
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.88/
DOI:
Bibkey:
Cite (ACL):
Manuela Hürlimann and Mark Cieliebak. 2026. ReproHum #0031–01: Reproducing a Human Readability Evaluation for Question–Answer Generation Systems. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 1111–1116, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
ReproHum #0031–01: Reproducing a Human Readability Evaluation for Question–Answer Generation Systems (Hürlimann & Cieliebak, GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.88.pdf