The Shared Task on Reproducibility of Evaluations in NLP (ReproNLP) 2026: Overview and Results

Anya Belz, Craig Thomson, Javier González Corbelle


Abstract
We present the 2026 Shared Task on Reproducibility of Evaluations in NLP (ReproNLP’26) which followed on from five predecessor shared tasks on reproducibility of evaluations, ReproNLP’25, ReproNLP’24, ReproNLP’23, ReproGen’22 and ReproGen’21.This shared task series forms part of an ongoing research programme designed to develop theory and practice of reproducibility assessment in NLP and machine learning, against a backdrop of increasing recognition of the importance of the topic across the two fields. We describe the ReproNLP’26 shared task, summarise results from the reproduction studies submitted, and provide additional comparative analysis of their results.
Anthology ID:
2026.gem-main.83
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1055–1070
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.83/
DOI:
Bibkey:
Cite (ACL):
Anya Belz, Craig Thomson, and Javier González Corbelle. 2026. The Shared Task on Reproducibility of Evaluations in NLP (ReproNLP) 2026: Overview and Results. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 1055–1070, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
The Shared Task on Reproducibility of Evaluations in NLP (ReproNLP) 2026: Overview and Results (Belz et al., GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.83.pdf