Bridging the Gap: Instruction-Tuned LLMs for Scientific Named Entity Recognition

Necva Bölücü; Maciej Rybinski; Stephen Wan

Bridging the Gap: Instruction-Tuned LLMs for Scientific Named Entity Recognition

Necva Bölücü, Maciej Rybinski, Stephen Wan

Abstract

Information extraction (IE) from scientific literature plays an important role in many information-seeking pipelines. Large Language Models (LLMs) have demonstrated strong zero-shot and few-shot performance on IE tasks. However, there are challenges in practical deployment, especially in scenarios that involve sensitive information, such as industrial research or limited budgets. A key question is whether there is a need for a fine-tuned model for optimal domain adaptation (i.e., whether in-domain labelled training data is needed, or zero-shot to few-shot effectiveness is enough). In this paper, we explore this question in the context of IE on scientific literature. We further consider methodological questions, such as alternatives to cloud-based proprietary LLMs (e.g., GPT and Claude) when these are unsuitable due to data privacy, data sensitivity, or cost reasons. This paper outlines empirical results to recommend which locally hosted open-source LLM approach to adopt and illustrates the trade-offs in domain adaptation.

Anthology ID:: 2025.wasp-main.7
Volume:: Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications
Month:: December
Year:: 2025
Address:: Mumbai, India and virtual
Editors:: Alberto Accomazzi, Tirthankar Ghosal, Felix Grezes, Kelly Lockhart
Venues:: WASP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 56–71
Language:
URL:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.7/
DOI:
Bibkey:
Cite (ACL):: Necva Bölücü, Maciej Rybinski, and Stephen Wan. 2025. Bridging the Gap: Instruction-Tuned LLMs for Scientific Named Entity Recognition. In Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications, pages 56–71, Mumbai, India and virtual. Association for Computational Linguistics.
Cite (Informal):: Bridging the Gap: Instruction-Tuned LLMs for Scientific Named Entity Recognition (Bölücü et al., WASP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.7.pdf

PDF Cite Search Fix data