Beyond Knowledge Graphs: PubMedBERT Embeddings as a Competitive Standalone Modality for Drug Re-purposing

Rishik Kondadadi, John E. Ortega


Abstract
Drug repurposing methods rely heavily on knowledge graph (KG) embeddings, but building and curating these graphs takes considerable effort. We present two findings on the Hetionet drug-disease benchmark and an epilepsy ranking task. First, PubMedBERT text embeddings, fed through the same downstream classifiers and identical 10-fold splits as four re-trained KG baselines (TransE, ComplEx, DistMult, RotatE), reach AUROC $0.910$, above all four (best: RotatE, $0.854$); a Random Forest on the same vectors scores $0.880$. The comparison is asymmetric in one important way: PubMedBERT was pretrained on the literature Hetionet was curated from, so the result is best read as “text-with-literature-supervision vs.graph-only,” and a head-to-head with text-augmented KG methods (KG-BERT, TxGNN) is left as follow-up. Second, across all seven combinations of text, molecular (ECFP4), and gene expression (LINCS L1000) features, cross-attention fusion of weaker modalities into text consistently degrades performance, despite a gated mechanism intended to suppress unhelpful modalities; the residual path forces the strong modality to absorb noise. The model also ranks proconvulsants (amoxapine, flumazenil) near the top, because text embeddings encode strength of association with a disease but not its direction.
Anthology ID:
2026.bionlp-1.13
Volume:
BioNLP 2026
Month:
July
Year:
2026
Address:
San Diego, California
Editors:
Dina Demner-Fushman, Sophia Ananiadou, Kirk Roberts, Junichi Tsujii
Venues:
BioNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
135–140
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.13/
DOI:
Bibkey:
Cite (ACL):
Rishik Kondadadi and John E. Ortega. 2026. Beyond Knowledge Graphs: PubMedBERT Embeddings as a Competitive Standalone Modality for Drug Re-purposing. In BioNLP 2026, pages 135–140, San Diego, California. Association for Computational Linguistics.
Cite (Informal):
Beyond Knowledge Graphs: PubMedBERT Embeddings as a Competitive Standalone Modality for Drug Re-purposing (Kondadadi & Ortega, BioNLP 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bionlp-1.13.pdf