Normalizing Health Concepts with Biomedical Embedding and LLMs

Iram Azam; Keyuan Jiang; Gordon Bernard

Normalizing Health Concepts with Biomedical Embedding and LLMs

Abstract

Accurate normalization of health-related expressions to standardized biomedical concepts is crucial for both healthcare and biomedical research. However, traditional string-based matching methods are limited by lexical variations. In this study, we propose a neural embedding-based normalization framework that utilizes an embedding model trained on biomedical terminology, generating over 3.59 million embeddings corresponding to UMLS terms and Concept Unique Identifiers (CUIs). For clinical data, CUIs were retrieved via semantic matching, while Twitter phrases were first processed using a large language model (LLM) to generate preferred terms prior to embedding-based CUI retrieval. Our approach substantially outperforms exact string matching and MetaMap Lite. For clinical data (3,144 phrases), normalization accuracy improved from 0.679 (string match) and 0.574 (MetaMap Lite) to 0.858. For Twitter data (102 phrases), accuracy increased from 0.235 (string match) and 0.118 (MetaMap Lite) to a range of 0.882 (Gemini 2.5 Flash) to 0.980 (GPT-4o mini). These findings highlight both the effectiveness of embedding-based semantic retrieval and the ability of LLMs to generate preferred terms, enhancing robustness in health concept normalization across diverse text sources.

Anthology ID:: 2026.healing-1.15
Volume:: Proceedings of the 1st Workshop on Linguistic Analysis for Health (HeaLing 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Danilova, Murathan Kurfalı, Ylva Söderfeldt, Julia Reed, Andrew Burchell
Venues:: HeaLing | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 180–190
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.healing-1.15/
DOI:
Bibkey:
Cite (ACL):: Iram Azam, Keyuan Jiang, and Gordon Bernard. 2026. Normalizing Health Concepts with Biomedical Embedding and LLMs. In Proceedings of the 1st Workshop on Linguistic Analysis for Health (HeaLing 2026), pages 180–190, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Normalizing Health Concepts with Biomedical Embedding and LLMs (Azam et al., HeaLing 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.healing-1.15.pdf

PDF Cite Search Fix data