ReproHum #0712-01: Human Evaluation Reproduction Report for “Hierarchical Sketch Induction for Paraphrase Generation”

Mohammad Arvan, Natalie Parde


Abstract
Human evaluations are indispensable in the development of NLP systems because they provide direct insights into how effectively these systems meet real-world needs and expectations. Ensuring the reproducibility of these evaluations is vital for maintaining credibility in natural language processing research. This paper presents our reproduction of the human evaluation experiments conducted by Hosking et al. (2022) for their paraphrase generation approach. Through careful replication we found that our results closely align with those in the original study, indicating a high degree of reproducibility.
Anthology ID:
2024.humeval-1.18
Volume:
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:
HumEval | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
210–220
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2024.humeval-1.18/
DOI:
Bibkey:
Cite (ACL):
Mohammad Arvan and Natalie Parde. 2024. ReproHum #0712-01: Human Evaluation Reproduction Report for “Hierarchical Sketch Induction for Paraphrase Generation”. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 210–220, Torino, Italia. ELRA and ICCL.
Cite (Informal):
ReproHum #0712-01: Human Evaluation Reproduction Report for “Hierarchical Sketch Induction for Paraphrase Generation” (Arvan & Parde, HumEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/2024.humeval-1.18.pdf