Reproducibility in Computational Linguistics: Is Source Code Enough?

Mohammad Arvan, Luís Pina, Natalie Parde


Abstract
The availability of source code has been put forward as one of the most critical factors for improving the reproducibility of scientific research. This work studies trends in source code availability at major computational linguistics conferences, namely, ACL, EMNLP, LREC, NAACL, and COLING. We observe positive trends, especially in conferences that actively promote reproducibility. We follow this by conducting a reproducibility study of eight papers published in EMNLP 2021, finding that source code releases leave much to be desired. Moving forward, we suggest all conferences require self-contained artifacts and provide a venue to evaluate such artifacts at the time of publication. Authors can include small-scale experiments and explicit scripts to generate each result to improve the reproducibility of their work.
Anthology ID:
2022.emnlp-main.150
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2350–2361
Language:
URL:
https://aclanthology.org/2022.emnlp-main.150
DOI:
Bibkey:
Cite (ACL):
Mohammad Arvan, Luís Pina, and Natalie Parde. 2022. Reproducibility in Computational Linguistics: Is Source Code Enough?. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2350–2361, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Reproducibility in Computational Linguistics: Is Source Code Enough? (Arvan et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.emnlp-main.150.pdf