Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

Alberto Lavelli, Anne-Lyse Minard, Fabio Rinaldi (Editors)

Anthology ID:
Brussels, Belgium
Association for Computational Linguistics
Bib Export formats:

Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis
Alberto Lavelli | Anne-Lyse Minard | Fabio Rinaldi

Detecting Diabetes Risk from Social Media Activity
Dane Bell | Egoitz Laparra | Aditya Kousik | Terron Ishihara | Mihai Surdeanu | Stephen Kobourov

This work explores the detection of individuals’ risk of type 2 diabetes mellitus (T2DM) directly from their social media (Twitter) activity. Our approach extends a deep learning architecture with several contributions: following previous observations that language use differs by gender, it captures and uses gender information through domain adaptation; it captures recency of posts under the hypothesis that more recent posts are more representative of an individual’s current risk status; and, lastly, it demonstrates that in this scenario where activity factors are sparsely represented in the data, a bag-of-word neural network model using custom dictionaries of food and activity words performs better than other neural sequence models. Our best model, which incorporates all these contributions, achieves a risk-detection F1 of 41.9, considerably higher than the baseline rate (36.9).

Treatment Side Effect Prediction from Online User-generated Content
Van Hoang Nguyen | Kazunari Sugiyama | Min-Yen Kan | Kishaloy Halder

With Health 2.0, patients and caregivers increasingly seek information regarding possible drug side effects during their medical treatments in online health communities. These are helpful platforms for non-professional medical opinions, yet pose risk of being unreliable in quality and insufficient in quantity to cover the wide range of potential drug reactions. Existing approaches which analyze such user-generated content in online forums heavily rely on feature engineering of both documents and users, and often overlook the relationships between posts within a common discussion thread. Inspired by recent advancements, we propose a neural architecture that models the textual content of user-generated documents and user experiences in online communities to predict side effects during treatment. Experimental results show that our proposed architecture outperforms baseline models.

Revisiting neural relation classification in clinical notes with external information
Simon Šuster | Madhumita Sushil | Walter Daelemans

Recently, segment convolutional neural networks have been proposed for end-to-end relation extraction in the clinical domain, achieving results comparable to or outperforming the approaches with heavy manual feature engineering. In this paper, we analyze the errors made by the neural classifier based on confusion matrices, and then investigate three simple extensions to overcome its limitations. We find that including ontological association between drugs and problems, and data-induced association between medical concepts does not reliably improve the performance, but that large gains are obtained by the incorporation of semantic classes to capture relation triggers.

Supervised Machine Learning for Extractive Query Based Summarisation of Biomedical Data
Mandeep Kaur | Diego Mollá

The automation of text summarisation of biomedical publications is a pressing need due to the plethora of information available online. This paper explores the impact of several supervised machine learning approaches for extracting multi-document summaries for given queries. In particular, we compare classification and regression approaches for query-based extractive summarisation using data provided by the BioASQ Challenge. We tackled the problem of annotating sentences for training classification systems and show that a simple annotation approach outperforms regression-based summarisation.

Comparing CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition
Zenan Zhai | Dat Quoc Nguyen | Karin Verspoor

We compare the use of LSTM-based and CNN-based character-level word embeddings in BiLSTM-CRF models to approach chemical and disease named entity recognition (NER) tasks. Empirical results over the BioCreative V CDR corpus show that the use of either type of character-level word embeddings in conjunction with the BiLSTM-CRF models leads to comparable state-of-the-art performance. However, the models using CNN-based character-level word embeddings have a computational performance advantage, increasing training time over word-based models by 25% while the LSTM-based character-level word embeddings more than double the required training time.

Deep learning for language understanding of mental health concepts derived from Cognitive Behavioural Therapy
Lina M. Rojas-Barahona | Bo-Hsiang Tseng | Yinpei Dai | Clare Mansfield | Osman Ramadan | Stefan Ultes | Michael Crawford | Milica Gašić

In recent years, we have seen deep learning and distributed representations of words and sentences make impact on a number of natural language processing tasks, such as similarity, entailment and sentiment analysis. Here we introduce a new task: understanding of mental health concepts derived from Cognitive Behavioural Therapy (CBT). We define a mental health ontology based on the CBT principles, annotate a large corpus where this phenomena is exhibited and perform understanding using deep learning and distributed representations. Our results show that the performance of deep learning models combined with word embeddings or sentence embeddings significantly outperform non-deep-learning models in this difficult task. This understanding module will be an essential component of a statistical dialogue system delivering therapy.

Investigating the Challenges of Temporal Relation Extraction from Clinical Text
Diana Galvan | Naoaki Okazaki | Koji Matsuda | Kentaro Inui

Temporal reasoning remains as an unsolved task for Natural Language Processing (NLP), particularly demonstrated in the clinical domain. The complexity of temporal representation in language is evident as results of the 2016 Clinical TempEval challenge indicate: the current state-of-the-art systems perform well in solving mention-identification tasks of event and time expressions but poorly in temporal relation extraction, showing a gap of around 0.25 point below human performance. We explore to adapt the tree-based LSTM-RNN model proposed by Miwa and Bansal (2016) to temporal relation extraction from clinical text, obtaining a five point improvement over the best 2016 Clinical TempEval system and two points over the state-of-the-art. We deliver a deep analysis of the results and discuss the next step towards human-like temporal reasoning.

De-identifying Free Text of Japanese Dummy Electronic Health Records
Kohei Kajiyama | Hiromasa Horiguchi | Takashi Okumura | Mizuki Morita | Yoshinobu Kano

A new law was established in Japan to promote utilization of EHRs for research and developments, while de-identification is required to use EHRs. However, studies of automatic de-identification in the healthcare domain is not active for Japanese language, no de-identification tool available in practical performance for Japanese medical domains, as far as we know. Previous work shows that rule-based methods are still effective, while deep learning methods are reported to be better recently. In order to implement and evaluate a de-identification tool in a practical level, we implemented three methods, rule-based, CRF, and LSTM. We prepared three datasets of pseudo EHRs with de-identification tags manually annotated. These datasets are derived from shared task data to compare with previous work, and our new data to increase training data. Our result shows that our LSTM-based method is better and robust, which leads to our future work that plans to apply our system to actual de-identification tasks in hospitals.

Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study
Drahomira Herrmannova | Steven Young | Robert Patton | Christopher Stahl | Nicole Kleinstreuer | Mary Wolfe

Identifying and extracting data elements such as study descriptors in publication full texts is a critical yet manual and labor-intensive step required in a number of tasks. In this paper we address the question of identifying data elements in an unsupervised manner. Specifically, provided a set of criteria describing specific study parameters, such as species, route of administration, and dosing regimen, we develop an unsupervised approach to identify text segments (sentences) relevant to the criteria. A binary classifier trained to identify publications that met the criteria performs better when trained on the candidate sentences than when trained on sentences randomly picked from the text, supporting the intuition that our method is able to accurately identify study descriptors.

Identification of Parallel Sentences in Comparable Monolingual Corpora from Different Registers
Rémi Cardon | Natalia Grabar

Parallel aligned sentences provide useful information for different NLP applications. Yet, this kind of data is seldom available, especially for languages other than English. We propose to exploit comparable corpora in French which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are related to the biomedical area. Our purpose is to state whether a given pair of specialized and simplified sentences is to be aligned or not. Manually created reference data show 0.76 inter-annotator agreement. We exploit a set of features and several automatic classifiers. The automatic alignment reaches up to 0.93 Precision, Recall and F-measure. In order to better evaluate the method, it is applied to data in English from the SemEval STS competitions. The same features and models are applied in monolingual and cross-lingual contexts, in which they show up to 0.90 and 0.73 F-measure, respectively.

Evaluation of a Prototype System that Automatically Assigns Subject Headings to Nursing Narratives Using Recurrent Neural Network
Hans Moen | Kai Hakala | Laura-Maria Peltonen | Henry Suhonen | Petri Loukasmäki | Tapio Salakoski | Filip Ginter | Sanna Salanterä

We present our initial evaluation of a prototype system designed to assist nurses in assigning subject headings to nursing narratives – written in the context of documenting patient care in hospitals. Currently nurses may need to memorize several hundred subject headings from standardized nursing terminologies when structuring and assigning the right section/subject headings to their text. Our aim is to allow nurses to write in a narrative manner without having to plan and structure the text with respect to sections and subject headings, instead the system should assist with the assignment of subject headings and restructuring afterwards. We hypothesize that this could reduce the time and effort needed for nursing documentation in hospitals. A central component of the system is a text classification model based on a long short-term memory (LSTM) recurrent neural network architecture, trained on a large data set of nursing notes. A simple Web-based interface has been implemented for user interaction. To evaluate the system, three nurses write a set of artificial nursing shift notes in a fully unstructured narrative manner, without planning for or consider the use of sections and subject headings. These are then fed to the system which assigns subject headings to each sentence and then groups them into paragraphs. Manual evaluation is conducted by a group of nurses. The results show that about 70% of the sentences are assigned to correct subject headings. The nurses believe that such a system can be of great help in making nursing documentation in hospitals easier and less time consuming. Finally, various measures and approaches for improving the system are discussed.

Automatically Detecting the Position and Type of Psychiatric Evaluation Report Sections
Deya Banisakher | Naphtali Rishe | Mark A. Finlayson

Psychiatric evaluation reports represent a rich and still mostly-untapped source of information for developing systems for automatic diagnosis and treatment of mental health problems. These reports contain free-text structured within sections using a convention of headings. We present a model for automatically detecting the position and type of different psychiatric evaluation report sections. We developed this model using a corpus of 150 sample reports that we gathered from the Web, and used sentences as a processing unit while section headings were used as labels of section type. From these labels we generated a unified hierarchy of labels of section types, and then learned n-gram models of the language found in each section. To model conventions for section order, we integrated these n-gram models with a Hierarchical Hidden Markov Model (HHMM) representing the probabilities of observed section orders found in the corpus, and then used this HHMM n-gram model in a decoding framework to infer the most likely section boundaries and section types for documents with their section labels removed. We evaluated our model over two tasks, namely, identifying section boundaries and identifying section types and orders. Our model significantly outperformed baselines for each task with an F1 of 0.88 for identifying section types, and a 0.26 WindowDiff (Wd) and 0.20 and (Pk) scores, respectively, for identifying section boundaries.

Iterative development of family history annotation guidelines using a synthetic corpus of clinical text
Taraka Rama | Pål Brekke | Øystein Nytrø | Lilja Øvrelid

In this article, we describe the development of annotation guidelines for family history information in Norwegian clinical text. We make use of incrementally developed synthetic clinical text describing patients’ family history relating to cases of cardiac disease and present a general methodology which integrates the synthetically produced clinical statements and guideline development. We analyze inter-annotator agreement based on the developed guidelines and present results from experiments aimed at evaluating the validity and applicability of the annotated corpus using machine learning techniques. The resulting annotated corpus contains 477 sentences and 6030 tokens. Both the annotation guidelines and the annotated corpus are made freely available and as such constitutes the first publicly available resource of Norwegian clinical text.

CAS: French Corpus with Clinical Cases
Natalia Grabar | Vincent Claveau | Clément Dalloux

Textual corpora are extremely important for various NLP applications as they provide information necessary for creating, setting and testing these applications and the corresponding tools. They are also crucial for designing reliable methods and reproducible results. Yet, in some areas, such as the medical area, due to confidentiality or to ethical reasons, it is complicated and even impossible to access textual data representative of those produced in these areas. We propose the CAS corpus built with clinical cases, such as they are reported in the published scientific literature in French. We describe this corpus, currently containing over 397,000 word occurrences, and the existing linguistic and semantic annotations.

Analysis of Risk Factor Domains in Psychosis Patient Health Records
Eben Holderness | Nicholas Miller | Kirsten Bolton | Philip Cawkwell | Marie Meteer | James Pustejovsky | Mei Hua-Hall

Readmission after discharge from a hospital is disruptive and costly, regardless of the reason. However, it can be particularly problematic for psychiatric patients, so predicting which patients may be readmitted is critically important but also very difficult. Clinical narratives in psychiatric electronic health records (EHRs) span a wide range of topics and vocabulary; therefore, a psychiatric readmission prediction model must begin with a robust and interpretable topic extraction component. We created a data pipeline for using document vector similarity metrics to perform topic extraction on psychiatric EHR data in service of our long-term goal of creating a readmission risk classifier. We show initial results for our topic extraction model and identify additional features we will be incorporating in the future.

Patient Risk Assessment and Warning Symptom Detection Using Deep Attention-Based Neural Networks
Ivan Girardi | Pengfei Ji | An-phi Nguyen | Nora Hollenstein | Adam Ivankay | Lorenz Kuhn | Chiara Marchiori | Ce Zhang

We present an operational component of a real-world patient triage system. Given a specific patient presentation, the system is able to assess the level of medical urgency and issue the most appropriate recommendation in terms of best point of care and time to treat. We use an attention-based convolutional neural network architecture trained on 600,000 doctor notes in German. We compare two approaches, one that uses the full text of the medical notes and one that uses only a selected list of medical entities extracted from the text. These approaches achieve 79% and 66% precision, respectively, but on a confidence threshold of 0.6, precision increases to 85% and 75%, respectively. In addition, a method to detect warning symptoms is implemented to render the classification task transparent from a medical perspective. The method is based on the learning of attention scores and a method of automatic validation using the same data.

Syntax-based Transfer Learning for the Task of Biomedical Relation Extraction
Joël Legrand | Yannick Toussaint | Chedy Raïssi | Adrien Coulet

Transfer learning (TL) proposes to enhance machine learning performance on a problem, by reusing labeled data originally designed for a related problem. In particular, domain adaptation consists, for a specific task, in reusing training data developed for the same task but a distinct domain. This is particularly relevant to the applications of deep learning in Natural Language Processing, because those usually require large annotated corpora that may not exist for the targeted domain, but exist for side domains. In this paper, we experiment with TL for the task of Relation Extraction (RE) from biomedical texts, using the TreeLSTM model. We empirically show the impact of TreeLSTM alone and with domain adaptation by obtaining better performances than the state of the art on two biomedical RE tasks and equal performances for two others, for which few annotated data are available. Furthermore, we propose an analysis of the role that syntactic features may play in TL for RE.

In-domain Context-aware Token Embeddings Improve Biomedical Named Entity Recognition
Golnar Sheikhshabbafghi | Inanc Birol | Anoop Sarkar

Rapidly expanding volume of publications in the biomedical domain makes it increasingly difficult for a timely evaluation of the latest literature. That, along with a push for automated evaluation of clinical reports, present opportunities for effective natural language processing methods. In this study we target the problem of named entity recognition, where texts are processed to annotate terms that are relevant for biomedical studies. Terms of interest in the domain include gene and protein names, and cell lines and types. Here we report on a pipeline built on Embeddings from Language Models (ELMo) and a deep learning package for natural language processing (AllenNLP). We trained context-aware token embeddings on a dataset of biomedical papers using ELMo, and incorporated these embeddings in the LSTM-CRF model used by AllenNLP for named entity recognition. We show these representations improve named entity recognition for different types of biomedical named entities. We also achieve a new state of the art in gene mention detection on the BioCreative II gene mention shared task.

Self-training improves Recurrent Neural Networks performance for Temporal Relation Extraction
Chen Lin | Timothy Miller | Dmitriy Dligach | Hadi Amiri | Steven Bethard | Guergana Savova

Neural network models are oftentimes restricted by limited labeled instances and resort to advanced architectures and features for cutting edge performance. We propose to build a recurrent neural network with multiple semantically heterogeneous embeddings within a self-training framework. Our framework makes use of labeled, unlabeled, and social media data, operates on basic features, and is scalable and generalizable. With this method, we establish the state-of-the-art result for both in- and cross-domain for a clinical temporal relation extraction task.

Listwise temporal ordering of events in clinical notes
Serena Jeblee | Graeme Hirst

We present metrics for listwise temporal ordering of events in clinical notes, as well as a baseline listwise temporal ranking model that generates a timeline of events that can be used in downstream medical natural language processing tasks.

Time Expressions in Mental Health Records for Symptom Onset Extraction
Natalia Viani | Lucia Yin | Joyce Kam | Ayunni Alawi | André Bittar | Rina Dutta | Rashmi Patel | Robert Stewart | Sumithra Velupillai

For psychiatric disorders such as schizophrenia, longer durations of untreated psychosis are associated with worse intervention outcomes. Data included in electronic health records (EHRs) can be useful for retrospective clinical studies, but much of this is stored as unstructured text which cannot be directly used in computation. Natural Language Processing (NLP) methods can be used to extract this data, in order to identify symptoms and treatments from mental health records, and temporally anchor the first emergence of these. We are developing an EHR corpus annotated with time expressions, clinical entities and their relations, to be used for NLP development. In this study, we focus on the first step, identifying time expressions in EHRs for patients with schizophrenia. We developed a gold standard corpus, compared this corpus to other related corpora in terms of content and time expression prevalence, and adapted two NLP systems for extracting time expressions. To the best of our knowledge, this is the first resource annotated for temporal entities in the mental health domain.

Evaluation of a Sequence Tagging Tool for Biomedical Texts
Julien Tourille | Matthieu Doutreligne | Olivier Ferret | Aurélie Névéol | Nicolas Paris | Xavier Tannier

Many applications in biomedical natural language processing rely on sequence tagging as an initial step to perform more complex analysis. To support text analysis in the biomedical domain, we introduce Yet Another SEquence Tagger (YASET), an open-source multi purpose sequence tagger that implements state-of-the-art deep learning algorithms for sequence tagging. Herein, we evaluate YASET on part-of-speech tagging and named entity recognition in a variety of text genres including articles from the biomedical literature in English and clinical narratives in French. To further characterize performance, we report distributions over 30 runs and different sizes of training datasets. YASET provides state-of-the-art performance on the CoNLL 2003 NER dataset (F1=0.87), MEDPOST corpus (F1=0.97), MERLoT corpus (F1=0.99) and NCBI disease corpus (F1=0.81). We believe that YASET is a versatile and efficient tool that can be used for sequence tagging in biomedical and clinical texts.

Learning to Summarize Radiology Findings
Yuhao Zhang | Daisy Yi Ding | Tianpei Qian | Christopher D. Manning | Curtis P. Langlotz

The Impression section of a radiology report summarizes crucial radiology findings in natural language and plays a central role in communicating these findings to physicians. However, the process of generating impressions by summarizing findings is time-consuming for radiologists and prone to errors. We propose to automate the generation of radiology impressions with neural sequence-to-sequence learning. We further propose a customized neural model for this task which learns to encode the study background information and use this information to guide the decoding process. On a large dataset of radiology reports collected from actual hospital studies, our model outperforms existing non-neural and neural baselines under the ROUGE metrics. In a blind experiment, a board-certified radiologist indicated that 67% of sampled system summaries are at least as good as the corresponding human-written summaries, suggesting significant clinical validity. To our knowledge our work represents the first attempt in this direction.