Once Upon a Replication: It is Humans’ Turn to Evaluate AI’s Understanding of Children’s Stories for QA Generation

Andra-Maria Florescu; Marius Micluta-Campeanu; Liviu P. Dinu

Once Upon a Replication: It is Humans’ Turn to Evaluate AI’s Understanding of Children’s Stories for QA Generation

Andra-Maria Florescu, Marius Micluta-Campeanu, Liviu P. Dinu

Abstract

The following paper presents the outcomes of a collaborative experiment on human evaluation from the ReproNLP 2024 shared task, track B, part of the ReproHum project. For this paper, we evaluated a QAG (question-answer generation) system centered on English children’s storybooks that was presented in a previous research, by using human evaluators for the study. The system generated relevant QA (Question-Answer) pairs based on a dataset with storybooks for early education (kindergarten up to middle school) called FairytaleQA. In the framework of the ReproHum project, we first outline the previous paper and the reproduction strategy that has been decided upon. The complete setup of the first human evaluation is then described, along with the modifications required to replicate it. We also add other relevant related works on this subject. In conclusion, we juxtapose the replication outcomes with those documented in the cited publication. Additionally, we explore the general features of this endeavor as well as its shortcomings.

Anthology ID:: 2024.humeval-1.10
Volume:: Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Simone Balloccu, Anya Belz, Rudali Huidrom, Ehud Reiter, Joao Sedoc, Craig Thomson
Venues:: HumEval | WS
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 106–113
Language:
URL:: https://aclanthology.org/2024.humeval-1.10
DOI:
Bibkey:
Cite (ACL):: Andra-Maria Florescu, Marius Micluta-Campeanu, and Liviu P. Dinu. 2024. Once Upon a Replication: It is Humans’ Turn to Evaluate AI’s Understanding of Children’s Stories for QA Generation. In Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pages 106–113, Torino, Italia. ELRA and ICCL.
Cite (Informal):: Once Upon a Replication: It is Humans’ Turn to Evaluate AI’s Understanding of Children’s Stories for QA Generation (Florescu et al., HumEval-WS 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2024.humeval-1.10.pdf
Optional supplementary material:: 2024.humeval-1.10.OptionalSupplementaryMaterial.zip

PDF Search Optional supplementary material