PreClinIE: An Annotated Corpus for Information Extraction in Preclinical Studies
Simona Doneva, Hanna Hubarava, Pia Härvelid, Wolfgang Zürrer, Julia Bugajska, Bernard Hild, David Brüschweiler, Gerold Schneider, Tilia Ellendorff, Benjamin Ineichen
Abstract
Animal research, sometimes referred to as preclinical research, plays a vital role in bridging the gap between basic science and clinical applications. However, the rapid increase in publications and the complexity of reported findings make it increasingly difficult for researchers to extract and assess relevant information. While automation through natural language processing (NLP) holds great potential for addressing this challenge, progress is hindered by the absence of high-quality, comprehensive annotated resources specific to preclinical studies. To fill this gap, we introduce PreClinIE, a fully open manually annotated dataset. The corpus consists of abstracts and methods sections from 725 publications, annotated for study rigor indicators (e.g., random allocation) and other study characteristics (e.g., species). We describe the data collection and annotation process, outlining the challenges of working with preclinical literature. By providing this resource, we aim to accelerate the development of NLP tools that enhance literature mining in preclinical research.- Anthology ID:
- 2025.bionlp-1.8
- Volume:
- ACL 2025
- Month:
- August
- Year:
- 2025
- Address:
- Viena, Austria
- Editors:
- Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Junichi Tsujii
- Venues:
- BioNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 74–87
- Language:
- URL:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.8/
- DOI:
- Cite (ACL):
- Simona Doneva, Hanna Hubarava, Pia Härvelid, Wolfgang Zürrer, Julia Bugajska, Bernard Hild, David Brüschweiler, Gerold Schneider, Tilia Ellendorff, and Benjamin Ineichen. 2025. PreClinIE: An Annotated Corpus for Information Extraction in Preclinical Studies. In ACL 2025, pages 74–87, Viena, Austria. Association for Computational Linguistics.
- Cite (Informal):
- PreClinIE: An Annotated Corpus for Information Extraction in Preclinical Studies (Doneva et al., BioNLP 2025)
- PDF:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.8.pdf