Betty Van Aken

Also published as: Betty van Aken


2023

pdf
Efficient Methods for Natural Language Processing: A Survey
Marcos Treviso | Ji-Ung Lee | Tianchu Ji | Betty van Aken | Qingqing Cao | Manuel R. Ciosici | Michael Hassid | Kenneth Heafield | Sara Hooker | Colin Raffel | Pedro H. Martins | André F. T. Martins | Jessica Zosa Forde | Peter Milder | Edwin Simpson | Noam Slonim | Jesse Dodge | Emma Strubell | Niranjan Balasubramanian | Leon Derczynski | Iryna Gurevych | Roy Schwartz
Transactions of the Association for Computational Linguistics, Volume 11

Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require fewer resources to achieve similar results. This survey synthesizes and relates current methods and findings in efficient NLP. We aim to provide both guidance for conducting NLP under limited resources, and point towards promising research directions for developing more efficient methods.

2022

pdf
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models
Betty Van Aken | Sebastian Herrmann | Alexander Löser
Proceedings of the 4th Clinical Natural Language Processing Workshop

Decision support systems based on clinical notes have the potential to improve patient care by pointing doctors towards overseen risks. Predicting a patient’s outcome is an essential part of such systems, for which the use of deep neural networks has shown promising results. However, the patterns learned by these networks are mostly opaque and previous work revealed both reproduction of systemic biases and unexpected behavior for out-of-distribution patients. For application in clinical practice it is crucial to be aware of such behavior. We thus introduce a testing framework that evaluates clinical models regarding certain changes in the input. The framework helps to understand learned patterns and their influence on model decisions. In this work, we apply it to analyse the change in behavior with regard to the patient characteristics gender, age and ethnicity. Our evaluation of three current clinical NLP models demonstrates the concrete effects of these characteristics on the models’ decisions. They show that model behavior varies drastically even when fine-tuned on the same data with similar AUROC score. These results exemplify the need for a broader communication of model behavior in the clinical domain.

pdf
This Patient Looks Like That Patient: Prototypical Networks for Interpretable Diagnosis Prediction from Clinical Text
Betty van Aken | Jens-Michalis Papaioannou | Marcel Naik | Georgios Eleftheriadis | Wolfgang Nejdl | Felix Gers | Alexander Loeser
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

The use of deep neural models for diagnosis prediction from clinical text has shown promising results. However, in clinical practice such models must not only be accurate, but provide doctors with interpretable and helpful results. We introduce ProtoPatient, a novel method based on prototypical networks and label-wise attention with both of these abilities. ProtoPatient makes predictions based on parts of the text that are similar to prototypical patients—providing justifications that doctors understand. We evaluate the model on two publicly available clinical datasets and show that it outperforms existing baselines. Quantitative and qualitative evaluations with medical doctors further demonstrate that the model provides valuable explanations for clinical decision support.

pdf
Cross-Lingual Knowledge Transfer for Clinical Phenotyping
Jens-Michalis Papaioannou | Paul Grundmann | Betty van Aken | Athanasios Samaras | Ilias Kyparissidis | George Giannakoulas | Felix Gers | Alexander Loeser
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Clinical phenotyping enables the automatic extraction of clinical conditions from patient records, which can be beneficial to doctors and clinics worldwide. However, current state-of-the-art models are mostly applicable to clinical notes written in English. We therefore investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language and have a small amount of in-domain data available. Our results reveal two strategies that outperform the state-of-the-art: Translation-based methods in combination with domain-specific encoders and cross-lingual encoders plus adapters. We find that these strategies perform especially well for classifying rare phenotypes and we advise on which method to prefer in which situation. Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.

2021

pdf
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration
Betty van Aken | Jens-Michalis Papaioannou | Manuel Mayrdorfer | Klemens Budde | Felix Gers | Alexander Loeser
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Outcome prediction from clinical text can prevent doctors from overlooking possible risks and help hospitals to plan capacities. We simulate patients at admission time, when decision support can be especially valuable, and contribute a novel *admission to discharge* task with four common outcome prediction targets: Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction. The ideal system should infer outcomes based on symptoms, pre-conditions and risk factors of a patient. We evaluate the effectiveness of language models to handle this scenario and propose *clinical outcome pre-training* to integrate knowledge about patient outcomes from multiple public sources. We further present a simple method to incorporate ICD code hierarchy into the models. We show that our approach improves performance on the outcome tasks against several baselines. A detailed analysis reveals further strengths of the model, including transferability, but also weaknesses such as handling of vital values and inconsistencies in the underlying data.

pdf
Assertion Detection in Clinical Notes: Medical Language Models to the Rescue?
Betty van Aken | Ivana Trajanovska | Amy Siu | Manuel Mayrdorfer | Klemens Budde | Alexander Loeser
Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations

In order to provide high-quality care, health professionals must efficiently identify the presence, possibility, or absence of symptoms, treatments and other relevant entities in free-text clinical notes. Such is the task of assertion detection - to identify the assertion class (present, possible, absent) of an entity based on textual cues in unstructured text. We evaluate state-of-the-art medical language models on the task and show that they outperform the baselines in all three classes. As transferability is especially important in the medical domain we further study how the best performing model behaves on unseen data from two other medical datasets. For this purpose we introduce a newly annotated set of 5,000 assertions for the publicly available MIMIC-III dataset. We conclude with an error analysis that reveals situations in which the models still go wrong and points towards future research directions.

2018

pdf
Challenges for Toxic Comment Classification: An In-Depth Error Analysis
Betty van Aken | Julian Risch | Ralf Krestel | Alexander Löser
Proceedings of the 2nd Workshop on Abusive Language Online (ALW2)

Toxic comment classification has become an active research field with many recently proposed approaches. However, while these approaches address some of the task’s challenges others still remain unsolved and directions for further research are needed. To this end, we compare different deep learning and shallow approaches on a new, large comment dataset and propose an ensemble that outperforms all individual models. Further, we validate our findings on a second dataset. The results of the ensemble enable us to perform an extensive error analysis, which reveals open challenges for state-of-the-art methods and directions towards pending future research. These challenges include missing paradigmatic context and inconsistent dataset labels.