Anastasios Lamproudis


2021

pdf bib
Developing a Clinical Language Model for Swedish: Continued Pretraining of Generic BERT with In-Domain Data
Anastasios Lamproudis | Aron Henriksson | Hercules Dalianis
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

The use of pretrained language models, fine-tuned to perform a specific downstream task, has become widespread in NLP. Using a generic language model in specialized domains may, however, be sub-optimal due to differences in language use and vocabulary. In this paper, it is investigated whether an existing, generic language model for Swedish can be improved for the clinical domain through continued pretraining with clinical text. The generic and domain-specific language models are fine-tuned and evaluated on three representative clinical NLP tasks: (i) identifying protected health information, (ii) assigning ICD-10 diagnosis codes to discharge summaries, and (iii) sentence-level uncertainty prediction. The results show that continued pretraining on in-domain data leads to improved performance on all three downstream tasks, indicating that there is a potential added value of domain-specific language models for clinical NLP.

pdf bib
Multi-label Diagnosis Classification of Swedish Discharge Summaries – ICD-10 Code Assignment Using KB-BERT
Sonja Remmer | Anastasios Lamproudis | Hercules Dalianis
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

The International Classification of Diseases (ICD) is a system for systematically recording patients’ diagnoses. Clinicians or professional coders assign ICD codes to patients’ medical records to facilitate funding, research, and administration. In most health facilities, clinical coding is a manual, time-demanding task that is prone to errors. A tool that automatically assigns ICD codes to free-text clinical notes could save time and reduce erroneous coding. While many previous studies have focused on ICD coding, research on Swedish patient records is scarce. This study explored different approaches to pairing Swedish clinical notes with ICD codes. KB-BERT, a BERT model pre-trained on Swedish text, was compared to the traditional supervised learning models Support Vector Machines, Decision Trees, and K-nearest Neighbours used as the baseline. When considering ICD codes grouped into ten blocks, the KB-BERT was superior to the baseline models, obtaining an F1-micro of 0.80 and an F1-macro of 0.58. When considering the 263 full ICD codes, the KB-BERT was outperformed by all baseline models at an F1-micro and F1-macro of zero. Wilcoxon signed-rank tests showed that the performance differences between the KB-BERT and the baseline models were statistically significant.