NSF-SciFy: Mining the NSF Awards Database for Scientific Claims

Delip Rao, Weiqiu You, Eric Wong, Chris Callison-Burch


Abstract
We introduce NSF-SciFy, a comprehensive dataset of scientific claims and investigation proposals extracted from National Science Foundation award abstracts. While previous scientific claim verification datasets have been limited in size and scope, NSF-SciFy represents a significant advance with an estimated 2.8 million claims from 400,000 abstracts spanning all science and mathematics disciplines. We present two focused subsets: NSF-SciFy-MatSci with 114,000 claims from materials science awards, and NSF-SciFy-20K with 135,000 claims across five NSF directorates. Using zero-shot prompting, we develop a scalable approach for joint extraction of scientific claims and investigation proposals. We demonstrate the dataset’s utility through three downstream tasks: non-technical abstract generation, claim extraction, and investigation proposal extraction. Fine-tuning language models on our dataset yields substantial improvements, with relative gains often exceeding 100%, particularly for claim and proposal extraction tasks. Our error analysis reveals that extracted claims exhibit high precision but lower recall, suggesting opportunities for further methodological refinement. NSF-SciFy enables new research directions in large-scale claim verification, scientific discovery tracking, and meta-scientific analysis.
Anthology ID:
2025.newsum-main.13
Volume:
Proceedings of The 5th New Frontiers in Summarization Workshop
Month:
November
Year:
2025
Address:
Hybrid
Editors:
Yue Dong, Wen Xiao, Haopeng Zhang, Rui Zhang, Ori Ernst, Lu Wang, Fei Liu
Venues:
NewSum | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
183–198
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.newsum-main.13/
DOI:
Bibkey:
Cite (ACL):
Delip Rao, Weiqiu You, Eric Wong, and Chris Callison-Burch. 2025. NSF-SciFy: Mining the NSF Awards Database for Scientific Claims. In Proceedings of The 5th New Frontiers in Summarization Workshop, pages 183–198, Hybrid. Association for Computational Linguistics.
Cite (Informal):
NSF-SciFy: Mining the NSF Awards Database for Scientific Claims (Rao et al., NewSum 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.newsum-main.13.pdf