ReproHum #0669-08: Reproducing Sentiment Transfer Evaluation

Kristýna Onderková, Mateusz Lango, Patrícia Schmidtová, Ondrej Dusek


Abstract
We describe a reproduction of a human annotation experiment that was performed to evaluate the effectiveness of text style transfer systems (Reif et al., 2021). Despite our efforts to closely imitate the conditions of the original study, the results obtained differ significantly from those in the original study. We performed a statistical analysis of the results obtained, discussed the sources of these discrepancies in the study design, and quantified reproducibility. The reproduction followed the common approach to reproduction adopted by the ReproHum project.
Anthology ID:
2025.gem-1.55
Volume:
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:
July
Year:
2025
Address:
Vienna, Austria and virtual meeting
Editors:
Kaustubh Dhole, Miruna Clinciu
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
601–608
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.55/
DOI:
Bibkey:
Cite (ACL):
Kristýna Onderková, Mateusz Lango, Patrícia Schmidtová, and Ondrej Dusek. 2025. ReproHum #0669-08: Reproducing Sentiment Transfer Evaluation. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 601–608, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
ReproHum #0669-08: Reproducing Sentiment Transfer Evaluation (Onderková et al., GEM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.55.pdf