ReproHum #0033-05: Human Evaluation Report on "Generating Scientific Definitions with Controllable Complexity"

Ines Arous, Jackie Chi Kit Cheung


Abstract
Human evaluation remains a central component of assessing NLG systems, especially for open-ended or creative generation tasks. Yet, the field still lacks standardized practices for designing and reporting such evaluations. In this paper, we present a reproduction study of the human evaluation conducted by August et al. for their method of generating scientific definitions with controllable complexity. By closely replicating their experimental setup, we find that our results partially align with the original findings, suggesting a moderate level of reproducibility.
Anthology ID:
2026.gem-main.89
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1117–1126
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.89/
DOI:
Bibkey:
Cite (ACL):
Ines Arous and Jackie Chi Kit Cheung. 2026. ReproHum #0033-05: Human Evaluation Report on "Generating Scientific Definitions with Controllable Complexity". In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 1117–1126, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
ReproHum #0033-05: Human Evaluation Report on “Generating Scientific Definitions with Controllable Complexity” (Arous & Cheung, GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.89.pdf