Data Drift in Clinical Outcome Prediction from Admission Notes
Paul Grundmann, Jens-Michalis Papaioannou, Tom Oberhauser, Thomas Steffek, Amy Siu, Wolfgang Nejdl, Alexander Loeser
Abstract
Clinical NLP research faces a scarcity of publicly available datasets due to privacy concerns. MIMIC-III marked a significant milestone, enabling substantial progress, and now, with MIMIC-IV, the dataset has expanded significantly, offering a broader scope. In this paper, we focus on the task of predicting clinical outcomes from clinical text. This is crucial in modern healthcare, aiding in preventive care, differential diagnosis, and capacity planning. We introduce a novel clinical outcome prediction dataset derived from MIMIC-IV. Furthermore, we provide initial insights into the performance of models trained on MIMIC-III when applied to our new dataset, with specific attention to potential data drift. We investigate challenges tied to evolving documentation standards and changing codes in the International Classification of Diseases (ICD) taxonomy, such as the transition from ICD-9 to ICD-10. We also explore variations in clinical text across different hospital wards. Our study aims to probe the robustness and generalization of clinical outcome prediction models, contributing to the ongoing advancement of clinical NLP in healthcare.- Anthology ID:
- 2024.lrec-main.391
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 4381–4391
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.391
- DOI:
- Cite (ACL):
- Paul Grundmann, Jens-Michalis Papaioannou, Tom Oberhauser, Thomas Steffek, Amy Siu, Wolfgang Nejdl, and Alexander Loeser. 2024. Data Drift in Clinical Outcome Prediction from Admission Notes. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4381–4391, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Data Drift in Clinical Outcome Prediction from Admission Notes (Grundmann et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2024.lrec-main.391.pdf