Data Drift in Clinical Outcome Prediction from Admission Notes

Paul Grundmann, Jens-Michalis Papaioannou, Tom Oberhauser, Thomas Steffek, Amy Siu, Wolfgang Nejdl, Alexander Loeser


Abstract
Clinical NLP research faces a scarcity of publicly available datasets due to privacy concerns. MIMIC-III marked a significant milestone, enabling substantial progress, and now, with MIMIC-IV, the dataset has expanded significantly, offering a broader scope. In this paper, we focus on the task of predicting clinical outcomes from clinical text. This is crucial in modern healthcare, aiding in preventive care, differential diagnosis, and capacity planning. We introduce a novel clinical outcome prediction dataset derived from MIMIC-IV. Furthermore, we provide initial insights into the performance of models trained on MIMIC-III when applied to our new dataset, with specific attention to potential data drift. We investigate challenges tied to evolving documentation standards and changing codes in the International Classification of Diseases (ICD) taxonomy, such as the transition from ICD-9 to ICD-10. We also explore variations in clinical text across different hospital wards. Our study aims to probe the robustness and generalization of clinical outcome prediction models, contributing to the ongoing advancement of clinical NLP in healthcare.
Anthology ID:
2024.lrec-main.391
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
4381–4391
Language:
URL:
https://aclanthology.org/2024.lrec-main.391
DOI:
Bibkey:
Cite (ACL):
Paul Grundmann, Jens-Michalis Papaioannou, Tom Oberhauser, Thomas Steffek, Amy Siu, Wolfgang Nejdl, and Alexander Loeser. 2024. Data Drift in Clinical Outcome Prediction from Admission Notes. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4381–4391, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Data Drift in Clinical Outcome Prediction from Admission Notes (Grundmann et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2024.lrec-main.391.pdf