Emilia Apostolova

2019

pdf abs
Combining Structured and Free-text Electronic Medical Record Data for Real-time Clinical Decision Support
Emilia Apostolova | Tony Wang | Tim Tschampel | Ioannis Koutroulis | Tom Velez
Proceedings of the 18th BioNLP Workshop and Shared Task

The goal of this work is to utilize Electronic Medical Record (EMR) data for real-time Clinical Decision Support (CDS). We present a deep learning approach to combining in real time available diagnosis codes (ICD codes) and free-text notes: Patient Context Vectors. Patient Context Vectors are created by averaging ICD code embeddings, and by predicting the same from free-text notes via a Convolutional Neural Network. The Patient Context Vectors were then simply appended to available structured data (vital signs and lab results) to build prediction models for a specific condition. Experiments on predicting ARDS, a rare and complex condition, demonstrate the utility of Patient Context Vectors as a means of summarizing the patient history and overall condition, and improve significantly the prediction model results.

2018

pdf abs
Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data
R. Andrew Kreek | Emilia Apostolova
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

Industry datasets used for text classification are rarely created for that purpose. In most cases, the data and target predictions are a by-product of accumulated historical data, typically fraught with noise, present in both the text-based document, as well as in the targeted labels. In this work, we address the question of how well performance metrics computed on noisy, historical data reflect the performance on the intended future machine learning model input. The results demonstrate the utility of dirty training datasets used to build prediction models for cleaner (and different) prediction inputs.

2017

pdf abs
Toward Automated Early Sepsis Alerting: Identifying Infection Patients from Nursing Notes
Emilia Apostolova | Tom Velez
BioNLP 2017

Severe sepsis and septic shock are conditions that affect millions of patients and have close to 50% mortality rate. Early identification of at-risk patients significantly improves outcomes. Electronic surveillance tools have been developed to monitor structured Electronic Medical Records and automatically recognize early signs of sepsis. However, many sepsis risk factors (e.g. symptoms and signs of infection) are often captured only in free text clinical notes. In this study, we developed a method for automatic monitoring of nursing notes for signs and symptoms of infection. We utilized a creative approach to automatically generate an annotated dataset. The dataset was used to create a Machine Learning model that achieved an F1-score ranging from 79 to 96%.

Manual text annotation is a resource-consuming endeavor necessary for NLP systems when they target new tasks or domains for which there are no existing annotated corpora. Distributing the annotation work across multiple contributors is a natural solution to reduce and manage the effort required. Although there are a few publicly available tools which support distributed collaborative text annotation, most of them have complex user interfaces and require a significant amount of involvement from the annotators/contributors as well as the project developers and administrators. We present a light-weight web application for highly distributed annotation projects - Djangology. The application takes advantage of the recent advances in web framework architecture that allow rapid development and deployment of web applications thus minimizing development time for customization. The application's web-based interface gives project administrators the ability to easily upload data, define project schemas, assign annotators, monitor progress, and review inter-annotator agreement statistics. The intuitive web-based user interface encourages annotator participation as contributors are not burdened by tool manuals, local installation, or configuration. The system has achieved a user response rate of 70% in two annotation projects involving more than 250 medical experts from various geographic locations.