Jannik Pedersen
2023
Investigating anatomical bias in clinical machine learning algorithms
Jannik Pedersen
|
Martin Laursen
|
Pernille Vinholt
|
Anne Alnor
|
Thiusius Savarimuthu
Findings of the Association for Computational Linguistics: EACL 2023
Clinical machine learning algorithms have shown promising results and could potentially be implemented in clinical practice to provide diagnosis support and improve patient treatment. Barriers for realisation of the algorithms’ full potential include bias which is systematic and unfair discrimination against certain individuals in favor of others. The objective of this work is to measure anatomical bias in clinical text algorithms. We define anatomical bias as unfair algorithmic outcomes against patients with medical conditions in specific anatomical locations. We measure the degree of anatomical bias across two machine learning models and two Danish clinical text classification tasks, and find that clinical text algorithms are highly prone to anatomical bias. We argue that datasets for creating clinical text algorithms should be curated carefully to isolate the effect of anatomical location in order to avoid bias against patient subgroups.
MeDa-BERT: A medical Danish pretrained transformer model
Jannik Pedersen
|
Martin Laursen
|
Pernille Vinholt
|
Thiusius Rajeeth Savarimuthu
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
This paper introduces a medical Danish BERT-based language model (MeDa-BERT) and medical Danish word embeddings. The word embeddings and MeDa-BERT were pretrained on a new medical Danish corpus consisting of 133M tokens from medical Danish books and text from the internet. The models showed improved performance over general-domain models on medical Danish classification tasks. The medical word embeddings and MeDa-BERT are publicly available.
Danish Clinical Named Entity Recognition and Relation Extraction
Martin Laursen
|
Jannik Pedersen
|
Rasmus Hansen
|
Thiusius Rajeeth Savarimuthu
|
Pernille Vinholt
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Electronic health records contain important information regarding the patients’ medical history but much of this information is stored in unstructured narrative text. This paper presents the first Danish clinical named entity recognition and relation extraction dataset for extraction of six types of clinical events, six types of attributes, and three types of relations. The dataset contains 11,607 paragraphs from Danish electronic health records containing 54,631 clinical events, 41,954 attributes, and 14,604 relations. We detail the methodology of developing the annotation scheme, and train a transformer-based architecture on the developed dataset with macro F1 performance of 60.05%, 44.85%, and 70.64% for clinical events, attributes, and relations, respectively.
Search