Aron Henriksson


2021

pdf bib
Developing a Clinical Language Model for Swedish: Continued Pretraining of Generic BERT with In-Domain Data
Anastasios Lamproudis | Aron Henriksson | Hercules Dalianis
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

The use of pretrained language models, fine-tuned to perform a specific downstream task, has become widespread in NLP. Using a generic language model in specialized domains may, however, be sub-optimal due to differences in language use and vocabulary. In this paper, it is investigated whether an existing, generic language model for Swedish can be improved for the clinical domain through continued pretraining with clinical text. The generic and domain-specific language models are fine-tuned and evaluated on three representative clinical NLP tasks: (i) identifying protected health information, (ii) assigning ICD-10 diagnosis codes to discharge summaries, and (iii) sentence-level uncertainty prediction. The results show that continued pretraining on in-domain data leads to improved performance on all three downstream tasks, indicating that there is a potential added value of domain-specific language models for clinical NLP.

2020

pdf bib
The Impact of De-identification on Downstream Named Entity Recognition in Clinical Text
Hanna Berg | Aron Henriksson | Hercules Dalianis
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

The impact of de-identification on data quality and, in particular, utility for developing models for downstream tasks has been more thoroughly studied for structured data than for unstructured text. While previous studies indicate that text de-identification has a limited impact on models for downstream tasks, it remains unclear what the impact is with various levels and forms of de-identification, in particular concerning the trade-off between precision and recall. In this paper, the impact of de-identification is studied on downstream named entity recognition in Swedish clinical text. The results indicate that de-identification models with moderate to high precision lead to similar downstream performance, while low precision has a substantial negative impact. Furthermore, different strategies for concealing sensitive information affect performance to different degrees, ranging from pseudonymisation having a low impact to the removal of entire sentences with sensitive information having a high impact. This study indicates that it is possible to increase the recall of models for identifying sensitive information without negatively affecting the use of de-identified text data for training models for clinical named entity recognition; however, there is ultimately a trade-off between the level of de-identification and the subsequent utility of the data.

2015

pdf bib
Expanding a dictionary of marker words for uncertainty and negation using distributional semantics
Alyaa Alfalahi | Maria Skeppstedt | Rickard Ahlbom | Roza Baskalayci | Aron Henriksson | Lars Asker | Carita Paradis | Andreas Kerren
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

pdf bib
Representing Clinical Notes for Adverse Drug Event Detection
Aron Henriksson
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis

2014

pdf bib
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)
Sumithra Velupillai | Martin Duneld | Maria Kvist | Hercules Dalianis | Maria Skeppstedt | Aron Henriksson
Proceedings of the 5th International Workshop on Health Text Mining and Information Analysis (Louhi)

pdf bib
EACL - Expansion of Abbreviations in CLinical text
Lisa Tengstrand | Beáta Megyesi | Aron Henriksson | Martin Duneld | Maria Kvist
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

2013

pdf bib
Corpus-Driven Terminology Development: Populating Swedish SNOMED CT with Synonyms Extracted from Electronic Health Records
Aron Henriksson | Maria Skeppstedt | Maria Kvist | Martin Duneld | Mike Conway
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

2011

pdf bib
Exploiting Structured Data, Negation Detection and SNOMED CT Terms in a Random Indexing Approach to Clinical Coding
Aron Henriksson | Martin Hassel
Proceedings of the Second Workshop on Biomedical Natural Language Processing

pdf bib
Something Old, Something New – Applying a Pre-trained Parsing Model to Clinical Swedish
Martin Duneld | Aron Henriksson | Sumithra Velupillai
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

2010

pdf bib
Levels of certainty in knowledge-intensive corpora: an initial annotation study
Aron Henriksson | Sumithra Velupillai
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing