Lena Al Mutair
2023
Enriching Electronic Health Record with Semantic Features UtilisingPretrained Transformers
Lena Al Mutair
|
Eric Atwell
|
Nishant Ravikumar
Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Electronic Health Records (EHRs) have revolutionised healthcare by enhancing patient care and facilitating provider communication. Nevertheless, the efficient extraction of valuable information from EHRs poses challenges, primarily due to the overwhelming volume of unstructured data, the wide variability in data formats, and the lack of standardised labels. Leveraging deep learning and concept embeddings, we address the gap in context-aware systems for EHRs. The proposed solution was evaluated on the MIMIC III dataset and demonstrated superior performance compared to other methodologies. We addressed the positive impact of the latent feature combined with the note representation in four different settings. Model performance was evaluated using a case study conducted with BertScore, assessing precision, recall, and F1 scores. The model excels in Medical Natural Language Inference (MedNLI) with an 89.3% accuracy, further boosted to 90.5% through retraining the embeddings using International Classification of Diseases (ICD) codes, which we formally designate as ClinicNarrIR. The ClinicNarrIR was tested with 1000 randomly sampled notes, achieving an N DCG@10 score of approximately 0.54 with accuracy@10 of 0.85. The study also demonstrates a high correlation between the results produced by the proposed representation and medical coders. Notably, in all evaluation cases, the optimal base pretrained model that emerged was BlueBERT.