Alessandro Bozzon


2020

pdf
Normalization of Long-tail Adverse Drug Reactions in Social Media
Emmanouil Manousogiannis | Sepideh Mesbah | Alessandro Bozzon | Robert-Jan Sips | Zoltan Szlanik | Selene Baez Santamaria
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

The automatic mapping of Adverse Drug Reaction (ADR) reports from user-generated content to concepts in a controlled medical vocabulary provides valuable insights for monitoring public health. While state-of-the-art deep learning-based sequence classification techniques achieve impressive performance for medical concepts with large amounts of training data, they show their limit with long-tail concepts that have a low number of training samples. The above hinders their adaptability to the changes of layman’s terminology and the constant emergence of new informal medical terms. Our objective in this paper is to tackle the problem of normalizing long-tail ADR mentions in user-generated content. In this paper, we exploit the implicit semantics of rare ADRs for which we have few training samples, in order to detect the most similar class for the given ADR. The evaluation results demonstrate that our proposed approach addresses the limitations of the existing techniques when the amount of training data is limited.

2019

pdf
Give It a Shot: Few-shot Learning to Normalize ADR Mentions in Social Media Posts
Emmanouil Manousogiannis | Sepideh Mesbah | Alessandro Bozzon | Selene Baez Santamaria | Robert Jan Sips
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

This paper describes the system that team MYTOMORROWS-TU DELFT developed for the 2019 Social Media Mining for Health Applications (SMM4H) Shared Task 3, for the end-to-end normalization of ADR tweet mentions to their corresponding MEDDRA codes. For the first two steps, we reuse a state-of-the art approach, focusing our contribution on the final entity-linking step. For that we propose a simple Few-Shot learning approach, based on pre-trained word embeddings and data from the UMLS, combined with the provided training data. Our system (relaxed F1: 0.337-0.345) outperforms the average (relaxed F1 0.2972) of the participants in this task, demonstrating the potential feasibility of few-shot learning in the context of medical text normalization.

pdf
Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content
Sepideh Mesbah | Jie Yang | Robert-Jan Sips | Manuel Valle Torre | Christoph Lofi | Alessandro Bozzon | Geert-Jan Houben
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Social media provides a timely yet challenging data source for adverse drug reaction (ADR) detection. Existing dictionary-based, semi-supervised learning approaches are intrinsically limited by the coverage and maintainability of laymen health vocabularies. In this paper, we introduce a data augmentation approach that leverages variational autoencoders to learn high-quality data distributions from a large unlabeled dataset, and subsequently, to automatically generate a large labeled training set from a small set of labeled samples. This allows for efficient social-media ADR detection with low training and re-training costs to adapt to the changes and emergence of informal medical laymen terms. An extensive evaluation performed on Twitter and Reddit data shows that our approach matches the performance of fully-supervised approaches while requiring only 25% of training data.