A Framework for Flexible Extraction of Clinical Event Contextual Properties from Electronic Health Records

Shubham Agarwal, Thomas Searle, Mart Ratas, Anthony Shek, James Teo, Richard Dobson


Abstract
Electronic Health Records contain vast amounts of valuable clinical data, much of which is stored as unstructured text. Extracting meaningful clinical events (e.g., disorders, symptoms, findings, medications, and procedures etc.) in context within real-world healthcare settings is crucial for enabling downstream applications such as disease prediction, clinical coding for billing and decision support.After Named Entity Recognition and Linking (NER+L) methodology, the identified concepts need to be further classified (i.e. contextualized) for distinct properties such as their relevance to the patient, their temporal and negated status for meaningful clinical use. We present a solution that, using an existing NER+L approach - MedCAT, classifies and contextualizes medical entities at scale. We evaluate the NLP approaches through 14 distinct real-world clinical text classification projects, testing our suite of models tailored to different clinical NLP needs. For tasks requiring high minority class recall, BERT proves the most effective when coupled with class imbalance mitigation techniques, outperforming Bi-LSTM with up to 28%. For majority class focused tasks, Bi-LSTM offers a lightweight alternative with, on average, 32% faster training time and lower computational cost. Importantly, these tools are integrated into an openly available library, enabling users to select the best model for their specific downstream applications.
Anthology ID:
2025.acl-industry.66
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Georg Rehm, Yunyao Li
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
946–959
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.acl-industry.66/
DOI:
Bibkey:
Cite (ACL):
Shubham Agarwal, Thomas Searle, Mart Ratas, Anthony Shek, James Teo, and Richard Dobson. 2025. A Framework for Flexible Extraction of Clinical Event Contextual Properties from Electronic Health Records. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 946–959, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
A Framework for Flexible Extraction of Clinical Event Contextual Properties from Electronic Health Records (Agarwal et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.acl-industry.66.pdf