COCOA: Creation and Exploratory Investigation of a COrpus of Claims frOm NLP Articles

Clémentine Bleuze, Fanny Ducel, Maxime Amblard, Karen Fort


Abstract
Research articles are an essential pillar of scientific knowledge, but they are subject to multiple constraints. On the one hand, their scientific reliability is essential and relies in particular on the peer review process. On the other hand, they fulfill a rhetorical function of persuasion for authors who defend claims in a more and more competitive environment. In a context of massively increasing publication growth and quickly evolving practices, it is essential that the scientific community remains alert and critical of its own biases. In this paper, we call for a "NLP for NLP" framing of theseissues. We created COCOA, a corpus of sentences from NLP papers and pre-prints published in English between 1952 and 2024, a sample of which we manually annotated with claim category labels reflecting their rhetorical function. We fine-tuned a SciBERT model to predict remaining labels, and made both the corpus and the model available to the community. We illustrate the interest of the corpus with exploratory analyses, and outline directions for further research. We hope that this work can stimulate discussions on the issues of research standardization and scientific overclaiming.
Anthology ID:
2026.lrec-main.188
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
2387–2399
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.188/
DOI:
Bibkey:
Cite (ACL):
Clémentine Bleuze, Fanny Ducel, Maxime Amblard, and Karen Fort. 2026. COCOA: Creation and Exploratory Investigation of a COrpus of Claims frOm NLP Articles. International Conference on Language Resources and Evaluation, main:2387–2399.
Cite (Informal):
COCOA: Creation and Exploratory Investigation of a COrpus of Claims frOm NLP Articles (Bleuze et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.188.pdf