Large Language Models for Scientific Information Extraction: An Empirical Study for Virology

Mahsa Shamsabadi, Jennifer D’Souza, Sören Auer


Abstract
In this paper, we champion the use of structured and semantic content representation of discourse-based scholarly communication, inspired by tools like Wikipedia infoboxes or structured Amazon product descriptions. These representations provide users with a concise overview, aiding scientists in navigating the dense academic landscape. Our novel automated approach leverages the robust text generation capabilities of LLMs to produce structured scholarly contribution summaries, offering both a practical solution and insights into LLMs’ emergent abilities.For LLMs, the prime focus is on improving their general intelligence as conversational agents. We argue that these models can also be applied effectively in information extraction (IE), specifically in complex IE tasks within terse domains like Science. This paradigm shift replaces the traditional modular, pipelined machine learning approach with a simpler objective expressed through instructions. Our results show that finetuned FLAN-T5 with 1000x fewer parameters than the state-of-the-art GPT-davinci is competitive for the task.
Anthology ID:
2024.findings-eacl.26
Volume:
Findings of the Association for Computational Linguistics: EACL 2024
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
374–392
Language:
URL:
https://aclanthology.org/2024.findings-eacl.26
DOI:
Bibkey:
Cite (ACL):
Mahsa Shamsabadi, Jennifer D’Souza, and Sören Auer. 2024. Large Language Models for Scientific Information Extraction: An Empirical Study for Virology. In Findings of the Association for Computational Linguistics: EACL 2024, pages 374–392, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Large Language Models for Scientific Information Extraction: An Empirical Study for Virology (Shamsabadi et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/dois-2013-emnlp/2024.findings-eacl.26.pdf
Software:
 2024.findings-eacl.26.software.zip
Note:
 2024.findings-eacl.26.note.zip
Video:
 https://preview.aclanthology.org/dois-2013-emnlp/2024.findings-eacl.26.mp4