When Does Retrieval Beat Direct LLM Diagnosis in Rare Disease? An Empirical Study of Ontology Coverage

Mohamed Elmofty; Ulf Leser

When Does Retrieval Beat Direct LLM Diagnosis in Rare Disease? An Empirical Study of Ontology Coverage

Abstract

Recent high-complexity agentic systems such as DeepRare perform strongly on rare disease diagnosis benchmarks, but it remains unclear when gains come from structured knowledge access and when they come from parametric LLM knowledge. We compare phenotypebased retrieval, LLM reranking, and unrestricted LLM diagnosis across seven benchmarks covering 10,382 cases. We find a clear performance crossover driven by retrieval coverage?the fraction of cases whose true diagnosis is within the retriever’s top-50: on highcoverage datasets, ontology-based retrieval dominates; on low-coverage datasets, openended LLM diagnosis takes the lead. Building on this, adding an LLM reranker over retrieved candidates further improves accuracy across our patient-case benchmarks, closing most of the remaining gap to agentic systems (within 2 pp on MME and LIRICAL). We trace the crossover to two structural failure modes of ontology-based retrieval?annotation sparsity and phenotypic homogeneity?and show that aggregate scores across mixed benchmarks can hide these qualitatively different diagnostic settings. These findings motivate per-dataset evaluation and hybrid diagnostic systems that combine retrieval, reranking, and parametric LLM generation based on case characteristics.

Anthology ID:: 2026.bionlp-1.41
Volume:: BioNLP 2026
Month:: July
Year:: 2026
Address:: San Diego, California
Editors:: Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:: BioNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 508–518
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.41/
DOI:
Bibkey:
Cite (ACL):: Mohamed Elmofty and Ulf Leser. 2026. When Does Retrieval Beat Direct LLM Diagnosis in Rare Disease? An Empirical Study of Ontology Coverage. In BioNLP 2026, pages 508–518, San Diego, California. Association for Computational Linguistics.
Cite (Informal):: When Does Retrieval Beat Direct LLM Diagnosis in Rare Disease? An Empirical Study of Ontology Coverage (Elmofty & Leser, BioNLP 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.41.pdf

PDF Cite Search Fix data