BioCoref: Benchmarking Biomedical Coreference Resolution with LLMs

Nourah Salem, Elizabeth White, Michael Bada, Lawrence Hunter


Abstract
Coreference resolution in biomedical texts presents unique challenges due to complex domain-specific terminology, high ambiguity in mention forms, and long-distance dependencies between coreferring expressions. In this work, we present a comprehensive evaluation of generative large language models (LLMs) for coreference resolution in the biomedical domain. Using the CRAFT corpus as our benchmark, we assess the LLMs’ performance with four prompting experiments that vary in their use of local, contextual enrichment, and domain-specific cues such as abbreviations and entity dictionaries.
Anthology ID:
2026.bionlp-1.42
Volume:
BioNLP 2026
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
519–530
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.42/
DOI:
Bibkey:
Cite (ACL):
Nourah Salem, Elizabeth White, Michael Bada, and Lawrence Hunter. 2026. BioCoref: Benchmarking Biomedical Coreference Resolution with LLMs. In BioNLP 2026, pages 519–530, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
BioCoref: Benchmarking Biomedical Coreference Resolution with LLMs (Salem et al., BioNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.42.pdf