ReproHum #0067-01: A Reproduction of the Evaluation of Cross-Lingual Summarization

Supryadi, Chuang Liu, Deyi Xiong


Abstract
Human evaluation is crucial as it offers a nuanced understanding that automated metrics often miss. By reproducing human evaluation, we can gain a better understanding of the original results. This paper is part of the ReproHum project, where our goal is to reproduce human evaluations from previous studies. We report the reproduction results of the human evaluation of cross-lingual summarization conducted by (CITATION). By comparing the original and reproduction studies, we find that our overall evaluation findings are largely consistent with those of the previous study. However, there are notable differences in evaluation scores between the two studies for certain model outputs. These discrepancies highlight the importance of carefully selecting evaluation methodologies and human annotators.
Anthology ID:
2025.gem-1.56
Volume:
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Month:
July
Year:
2025
Address:
Vienna, Austria and virtual meeting
Editors:
Kaustubh Dhole, Miruna Clinciu
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
609–614
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.56/
DOI:
Bibkey:
Cite (ACL):
Supryadi, Chuang Liu, and Deyi Xiong. 2025. ReproHum #0067-01: A Reproduction of the Evaluation of Cross-Lingual Summarization. In Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²), pages 609–614, Vienna, Austria and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
ReproHum #0067-01: A Reproduction of the Evaluation of Cross-Lingual Summarization (Supryadi et al., GEM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.gem-1.56.pdf