Bias in Danish Medical Notes: Infection Classification of Long Texts Using Transformer and LSTM Architectures Coupled with BERT

Mehdi Parviz, Rudi Agius, Carsten Niemann, Rob Van Der Goot


Abstract
Medical notes contain a wealth of information related to diagnosis, prognosis, and overall patient care that can be used to help physicians make informed decisions. However, like any other data sets consisting of data from diverse demographics, they may be biased toward certain subgroups or subpopulations. Consequently, any bias in the data will be reflected in the output of the machine learning models trained on them. In this paper, we investigate the existence of such biases in Danish medical notes related to three types of blood cancer, with the goal of classifying whether the medical notes indicate severe infection. By employing a hierarchical architecture that combines a sequence model (Transformer and LSTM) with a BERT model to classify long notes, we uncover biases related to demographics and cancer types. Furthermore, we observe performance differences between hospitals. These findings underscore the importance of investigating bias in critical settings such as healthcare and the urgency of monitoring and mitigating it when developing AI-based systems.
Anthology ID:
2025.cl4health-1.27
Volume:
Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health)
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Sophia Ananiadou, Dina Demner-Fushman, Deepak Gupta, Paul Thompson
Venues:
CL4Health | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
316–320
Language:
URL:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.cl4health-1.27/
DOI:
Bibkey:
Cite (ACL):
Mehdi Parviz, Rudi Agius, Carsten Niemann, and Rob Van Der Goot. 2025. Bias in Danish Medical Notes: Infection Classification of Long Texts Using Transformer and LSTM Architectures Coupled with BERT. In Proceedings of the Second Workshop on Patient-Oriented Language Processing (CL4Health), pages 316–320, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Bias in Danish Medical Notes: Infection Classification of Long Texts Using Transformer and LSTM Architectures Coupled with BERT (Parviz et al., CL4Health 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.cl4health-1.27.pdf