Multimodal Transformers for Clinical Time Series Forecasting and Early Sepsis Prediction

Jinghua Xu, Michael Staniek


Abstract
Sepsis is a leading cause of death in Intensive Care Units (ICU). Early detection of sepsis is crucial to patient survival. Existing works in the clinical domain focus mainly on directly predicting a ground truth label that is the outcome of a medical syndrome or condition such as sepsis. In this work, we primarily focus on clinical time series forecasting as a means to solve downstream predictive tasks intermediately. We base our work on a strong monomodal baseline and propose multimodal transformers using set functions via fusing both physiological features and texts in electronic health record (EHR) data. Furthermore, we propose hierarchical transformers to effectively represent clinical document time series via attention mechanism and continuous time encoding. Our multimodal models significantly outperform baseline on MIMIC-III data by notable gaps. Our ablation analysis show that our atomic approaches to multimodal fusion and hierarchical transformers for document series embedding are effective in forecasting. We further fine-tune the forecasting models with labelled data and found some of the multimodal models consistently outperforming baseline on downstream sepsis prediction task.
Anthology ID:
2025.cl4health-1.8
Volume:
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Sophia Ananiadou, Dina Demner-Fushman, Deepak Gupta, Paul Thompson
Venues:
CL4Health | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
100–108
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.cl4health-1.8/
DOI:
Bibkey:
Cite (ACL):
Jinghua Xu and Michael Staniek. 2025. Multimodal Transformers for Clinical Time Series Forecasting and Early Sepsis Prediction. In Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health), pages 100–108, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Multimodal Transformers for Clinical Time Series Forecasting and Early Sepsis Prediction (Xu & Staniek, CL4Health 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.cl4health-1.8.pdf