Thesis Proposal: Stability-Aware, Evidence-Grounded Knowledge Graph for Substance Use Disorders and Social Determinants of Health

Gautham Vijay Kumar


Abstract
Clinical Natural Language Processing (NLP) integrates large language models (LLMs) to extract biomedical insights from unstructured clinical text. Most named entity recognition (NER) and relation extraction (RE) datasets rely on manual annotation, which is costly and difficult to scale. Many biomedical knowledge graphs (KG) suffer from underspecified relations, conflate causal and correlational claims, and edges lack evidence for reasoning. This dissertation presents a semantic stability framework for constructing explainable KGs, highlighting stable extraction as fundamental for scalable NER and RE, and essential for graph structure. We applied this to Substance Use Disorders (SUD) and Social Determinants of Health (SDOH) from PubMed corpus and NER and RE annotation guide. Multiple LLMs perform extraction under shared semantic constraints, with disagreements resolved through Human-in-the-Loop (HITL) validation. We define semantic stability through NER and RE metrics, using stabilized gold data for model training and evaluation. We then develop a claim-centered KG, where edges represent evidence, provenance, relation type, directionality, polarity, and stability indicators. This benchmark and pipeline supports multi-hop reasoning, triadic SUD–SDOH–SUD mediation patterns, and feedback loop analysis. This will advance etiological inquiries and data-driven health policy analysis.
Anthology ID:
2026.eacl-srw.58
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Selene Baez Santamaria, Sai Ashish Somayajula, Atsuki Yamaguchi
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
787–796
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.58/
DOI:
Bibkey:
Cite (ACL):
Gautham Vijay Kumar. 2026. Thesis Proposal: Stability-Aware, Evidence-Grounded Knowledge Graph for Substance Use Disorders and Social Determinants of Health. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 787–796, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Thesis Proposal: Stability-Aware, Evidence-Grounded Knowledge Graph for Substance Use Disorders and Social Determinants of Health (Kumar, EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.58.pdf