ReproHum #0669-08: Reproducing Sentiment Transfer Evaluation
Kristýna Onderková, Mateusz Lango, Patrícia Schmidtová, Ondrej Dusek
Abstract
We describe a reproduction of a human annotation experiment that was performed to evaluate the effectiveness of text style transfer systems (Reif et al., 2021). Despite our efforts to closely imitate the conditions of the original study, the results obtained differ significantly from those in the original study. We performed a statistical analysis of the results obtained, discussed the sources of these discrepancies in the study design, and quantified reproducibility. The reproduction followed the common approach to reproduction adopted by the ReproHum project.- Anthology ID:
- 2025.gem-1.55
- Volume:
- Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria and virtual meeting
- Editors:
- Kaustubh Dhole, Miruna Clinciu
- Venues:
- GEM | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 601–608
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.55/
- DOI:
- Cite (ACL):
- Kristýna Onderková, Mateusz Lango, Patrícia Schmidtová, and Ondrej Dusek. 2025. ReproHum #0669-08: Reproducing Sentiment Transfer Evaluation. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 601–608, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
- Cite (Informal):
- ReproHum #0669-08: Reproducing Sentiment Transfer Evaluation (Onderková et al., GEM 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.55.pdf