SciCiteVal: A Multi-Domain Dataset for Scientific Citation Verification

Qinyue Liu; Yongxin Zhou; Cyril Labbe

SciCiteVal: A Multi-Domain Dataset for Scientific Citation Verification

Abstract

Citations are an integral and important part of scientific papers. However, there exist erroneous citations ranging from careless mistakes to deliberate misconduct, and there are currently few studies or benchmark datasets dedicated to automated citation verification. To bridge this gap, we introduce SciCiteVal, a novel, manually annotated dataset for citation verification. Each instance in SciCiteVal pairs a citation context from a citing paper with the corresponding evidence passage extracted from the full text of the cited source. The dataset features a comprehensive taxonomy, where each citation is annotated as "Correct”, "Incorrect”, or "Unrelated”, with the "Incorrect” category further divided into five fine-grained sub-categories. The completed dataset comprises over 1,000 annotated citations, distributed as 302 "Correct”, 302 "Incorrect”, and 430 "Unrelated” instances. We establish a benchmark by evaluating different Large Language Models (LLMs), providing baseline performance and a detailed analysis. We release SciCiteVal as a resource to support the development of citation verification systems and to facilitate research on evidence-based tasks.

Anthology ID:: 2026.lrec-main.125
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 1603–1611
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.125/
DOI:
Bibkey:
Cite (ACL):: Qinyue Liu, Yongxin Zhou, and Cyril Labbe. 2026. SciCiteVal: A Multi-Domain Dataset for Scientific Citation Verification. International Conference on Language Resources and Evaluation, main:1603–1611.
Cite (Informal):: SciCiteVal: A Multi-Domain Dataset for Scientific Citation Verification (Liu et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.125.pdf

PDF Cite Search Fix data