ReproHum #0669-08: Reproducing a Recipe for Arbitrary Text Style Transfer with LLMs

Saad Mahamood


Abstract
We describe our attempt to reproduce a single human evaluation quality criterion that was conducted in the paper “Reproducing a Recipe for Arbitrary Text Style Transfer with LLMs”. This paper describes the approach and challenges involved in reproducing the human evaluation as done by the original authors. In particular, we describe negative results obtained during the reproduction, and we compare our results with an earlier reproduction for the same experiment. Finally, we describe the insights we gained from attempting this particular reproduction and the barriers that remain in attempting successful reproductions. The results and insights presented will hopefully enable the broader NLP research community to improve both how human evaluations are conducted and enable better reproducibility of NLP experiments in the future.
Anthology ID:
2026.gem-main.90
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1127–1132
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.90/
DOI:
Bibkey:
Cite (ACL):
Saad Mahamood. 2026. ReproHum #0669-08: Reproducing a Recipe for Arbitrary Text Style Transfer with LLMs. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 1127–1132, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
ReproHum #0669-08: Reproducing a Recipe for Arbitrary Text Style Transfer with LLMs (Mahamood, GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.90.pdf