Malo Ruelle


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
The 2025 ReproNLP Shared Task on Reproducibility of Evaluations in NLP: Overview and Results
Anya Belz | Craig Thomson | Javier González Corbelle | Malo Ruelle
Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)

This paper presents an overview of, and the results from, the 2025 Shared Task on Reproducibility of Evaluations in NLP (ReproNLP’25) which followed on from four previous shared tasks on reproducibility of evaluations, ReproNLP’24, ReproNLP’23, ReproGen’22 and ReproGen’21. This shared task series forms part of an ongoing research programme designed to develop theory and practice of reproducibility assessment in NLP and machine learning, against a backdrop of increasing recognition of the importance of the topic across the two fields. We describe the ReproNLP’25 shared task, summarise results from the reproduction studies submitted, and provide additional comparative analysis of their results, including for the first time additional, ‘sanity-check’ evaluations by LLMs.