Benchmarking zero-shot biomedical relation triplet extraction across language model architectures

Frederik Gade, Ole Lund, Marie Lisandra Mendoza


Abstract
Many language models (LMs) in the literature claim excellent zero-shot and/or few-shot capabilities for named entity recognition (NER) and relation extraction (RE) tasks and assert their ability to generalize beyond their training datasets. However, these claims have yet to be tested across different model architectures. This paper presents a performance evaluation of zero-shot relation triplet extraction (NER followed by RE of the entities) for both small and large LMs, utilizing 13,867 texts from 61 biomedical corpora and encompassing 151 unique entity types. This comprehensive evaluation offers valuable insights into the practical applicability and performance of LMs within the intricate domain of biomedical relation triplet extraction, highlighting their effectiveness in managing a diverse range of relations and entity types. Gemini 1.5 Pro, the largest LM included in the study, was the top-performing zero-shot model, achieving an average partial match micro F1 of 0.492 for NER, followed closely by SciLitLLM 1.5 14B with a score of 0.475. Fine-tuned models generally outperformed others on the corpora they were trained on, even in a few-shot setting, but struggled to generalize across all datasets with similar entity types. No models achieved an F1 score above 0.5 for the RTE task on any dataset, and their scores fluctuated based on the specific class of entity and the dataset involved. This observation highlights that there is still large room for improvement on the zero-shot utility of LMs in biomedical RTE applications.
Anthology ID:
2025.bionlp-1.9
Volume:
ACL 2025
Month:
August
Year:
2025
Address:
Viena, Austria
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
88–100
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.9/
DOI:
Bibkey:
Cite (ACL):
Frederik Gade, Ole Lund, and Marie Lisandra Mendoza. 2025. Benchmarking zero-shot biomedical relation triplet extraction across language model architectures. In ACL 2025, pages 88–100, Viena, Austria. Association for Computational Linguistics.
Cite (Informal):
Benchmarking zero-shot biomedical relation triplet extraction across language model architectures (Gade et al., BioNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bionlp-1.9.pdf
Supplementarymaterial:
 2025.bionlp-1.9.SupplementaryMaterial.txt