Empowering Knowledge Discovery from Scientific Literature: A novel approach to Research Artifact Analysis
Petros Stavropoulos, Ioannis Lyris, Natalia Manola, Ioanna Grypari, Haris Papageorgiou
Abstract
Knowledge extraction from scientific literature is a major issue, crucial to promoting transparency, reproducibility, and innovation in the research community. In this work, we present a novel approach towards the identification, extraction and analysis of dataset and code/software mentions within scientific literature. We introduce a comprehensive dataset, synthetically generated by ChatGPT and meticulously curated, augmented, and expanded with real snippets of scientific text from full-text publications in Computer Science using a human-in-the-loop process. The dataset contains snippets highlighting mentions of the two research artifact (RA) types: dataset and code/software, along with insightful metadata including their Name, Version, License, URL as well as the intended Usage and Provenance. We also fine-tune a simple Large Language Model (LLM) using Low-Rank Adaptation (LoRA) to transform the Research Artifact Analysis (RAA) into an instruction-based Question Answering (QA) task. Ultimately, we report the improvements in performance on the test set of our dataset when compared to other base LLM models. Our method provides a significant step towards facilitating accurate, effective, and efficient extraction of datasets and software from scientific papers, contributing to the challenges of reproducibility and reusability in scientific research.- Anthology ID:
- 2023.nlposs-1.5
- Volume:
- Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Liling Tan, Dmitrijs Milajevs, Geeticka Chauhan, Jeremy Gwinnup, Elijah Rippeth
- Venues:
- NLPOSS | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 37–53
- Language:
- URL:
- https://aclanthology.org/2023.nlposs-1.5
- DOI:
- 10.18653/v1/2023.nlposs-1.5
- Cite (ACL):
- Petros Stavropoulos, Ioannis Lyris, Natalia Manola, Ioanna Grypari, and Haris Papageorgiou. 2023. Empowering Knowledge Discovery from Scientific Literature: A novel approach to Research Artifact Analysis. In Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023), pages 37–53, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Empowering Knowledge Discovery from Scientific Literature: A novel approach to Research Artifact Analysis (Stavropoulos et al., NLPOSS-WS 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2023.nlposs-1.5.pdf