A Hybrid Retrieval System for Adverse Event Concept Normalization Integrating Contextual Scoring, Lexical Augmentation, and Semantic Fine-Tuning

Saipriya Dipika Vaidyanathan


Abstract
This paper presents a fully automated pipeline for normalizing adverse drug event (ADE) mentions identified in user-generated medical texts, to MedDRA concepts. The core approach here is a hybrid retrieval architecture combining domain-specific phrase normalization, synonym augmentation, and explicit mappings for key symptoms, thereby improving coverage of lexical variants. For candidate generation, the system employs a blend of exact dictionary lookups and fuzzy matching, supplemented by drug-specific contextual scoring. A sentencetransformer model (distilroberta-v1) was finetuned on augmented phrases, with reciprocal rank fusion unifying multiple retrieval signals.
Anthology ID:
2025.alta-main.19
Volume:
Proceedings of The 23rd Annual Workshop of the Australasian Language Technology Association
Month:
November
Year:
2025
Address:
Sydney, Australia
Editors:
Jonathan K. Kummerfeld, Aditya Joshi, Mark Dras
Venue:
ALTA
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
240–244
Language:
URL:
https://preview.aclanthology.org/ingest-alta/2025.alta-main.19/
DOI:
Bibkey:
Cite (ACL):
Saipriya Dipika Vaidyanathan. 2025. A Hybrid Retrieval System for Adverse Event Concept Normalization Integrating Contextual Scoring, Lexical Augmentation, and Semantic Fine-Tuning. In Proceedings of The 23rd Annual Workshop of the Australasian Language Technology Association, pages 240–244, Sydney, Australia. Association for Computational Linguistics.
Cite (Informal):
A Hybrid Retrieval System for Adverse Event Concept Normalization Integrating Contextual Scoring, Lexical Augmentation, and Semantic Fine-Tuning (Vaidyanathan, ALTA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-alta/2025.alta-main.19.pdf