Dmitriy Dligach


2021

pdf bib
EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain
Chen Lin | Timothy Miller | Dmitriy Dligach | Steven Bethard | Guergana Savova
Proceedings of the 20th Workshop on Biomedical Language Processing

Transformer-based neural language models have led to breakthroughs for a variety of natural language processing (NLP) tasks. However, most models are pretrained on general domain data. We propose a methodology to produce a model focused on the clinical domain: continued pretraining of a model with a broad representation of biomedical terminology (PubMedBERT) on a clinical corpus along with a novel entity-centric masking strategy to infuse domain knowledge in the learning process. We show that such a model achieves superior results on clinical extraction tasks by comparing our entity-centric masking strategy with classic random masking on three clinical NLP tasks: cross-domain negation detection, document time relation (DocTimeRel) classification, and temporal relation extraction. We also evaluate our models on the PubMedQA dataset to measure the models’ performance on a non-entity-centric task in the biomedical domain. The language addressed in this work is English.

2020

pdf bib
Defining and Learning Refined Temporal Relations in the Clinical Narrative
Kristin Wright-Bettner | Chen Lin | Timothy Miller | Steven Bethard | Dmitriy Dligach | Martha Palmer | James H. Martin | Guergana Savova
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

We present refinements over existing temporal relation annotations in the Electronic Medical Record clinical narrative. We refined the THYME corpus annotations to more faithfully represent nuanced temporality and nuanced temporal-coreferential relations. The main contributions are in re-defining CONTAINS and OVERLAP relations into CONTAINS, CONTAINS-SUBEVENT, OVERLAP and NOTED-ON. We demonstrate that these refinements lead to substantial gains in learnability for state-of-the-art transformer models as compared to previously reported results on the original THYME corpus. We thus establish a baseline for the automatic extraction of these refined temporal relations. Although our study is done on clinical narrative, we believe it addresses far-reaching challenges that are corpus- and domain- agnostic.

pdf bib
A BERT-based One-Pass Multi-Task Model for Clinical Temporal Relation Extraction
Chen Lin | Timothy Miller | Dmitriy Dligach | Farig Sadeque | Steven Bethard | Guergana Savova
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing

Recently BERT has achieved a state-of-the-art performance in temporal relation extraction from clinical Electronic Medical Records text. However, the current approach is inefficient as it requires multiple passes through each input sequence. We extend a recently-proposed one-pass model for relation classification to a one-pass model for relation extraction. We augment this framework by introducing global embeddings to help with long-distance relation inference, and by multi-task learning to increase model performance and generalizability. Our proposed model produces results on par with the state-of-the-art in temporal relation extraction on the THYME corpus and is much “greener” in computational cost.

2019

pdf bib
Extracting Adverse Drug Event Information with Minimal Engineering
Timothy Miller | Alon Geva | Dmitriy Dligach
Proceedings of the 2nd Clinical Natural Language Processing Workshop

In this paper we describe an evaluation of the potential of classical information extraction methods to extract drug-related attributes, including adverse drug events, and compare to more recently developed neural methods. We use the 2018 N2C2 shared task data as our gold standard data set for training. We train support vector machine classifiers to detect drug and drug attribute spans, and pair these detected entities as training instances for an SVM relation classifier, with both systems using standard features. We compare to baseline neural methods that use standard contextualized embedding representations for entity and relation extraction. The SVM-based system and a neural system obtain comparable results, with the SVM system doing better on concepts and the neural system performing better on relation extraction tasks. The neural system obtains surprisingly strong results compared to the system based on years of research in developing features for information extraction.

pdf bib
A BERT-based Universal Model for Both Within- and Cross-sentence Clinical Temporal Relation Extraction
Chen Lin | Timothy Miller | Dmitriy Dligach | Steven Bethard | Guergana Savova
Proceedings of the 2nd Clinical Natural Language Processing Workshop

Classic methods for clinical temporal relation extraction focus on relational candidates within a sentence. On the other hand, break-through Bidirectional Encoder Representations from Transformers (BERT) are trained on large quantities of arbitrary spans of contiguous text instead of sentences. In this study, we aim to build a sentence-agnostic framework for the task of CONTAINS temporal relation extraction. We establish a new state-of-the-art result for the task, 0.684F for in-domain (0.055-point improvement) and 0.565F for cross-domain (0.018-point improvement), by fine-tuning BERT and pre-training domain-specific BERT models on sentence-agnostic temporal relation instances with WordPiece-compatible encodings, and augmenting the labeled data with automatically generated “silver” instances.

pdf bib
Two-stage Federated Phenotyping and Patient Representation Learning
Dianbo Liu | Dmitriy Dligach | Timothy Miller
Proceedings of the 18th BioNLP Workshop and Shared Task

A large percentage of medical information is in unstructured text format in electronic medical record systems. Manual extraction of information from clinical notes is extremely time consuming. Natural language processing has been widely used in recent years for automatic information extraction from medical texts. However, algorithms trained on data from a single healthcare provider are not generalizable and error-prone due to the heterogeneity and uniqueness of medical documents. We develop a two-stage federated natural language processing method that enables utilization of clinical notes from different hospitals or clinics without moving the data, and demonstrate its performance using obesity and comorbities phenotyping as medical task. This approach not only improves the quality of a specific clinical task but also facilitates knowledge progression in the whole healthcare system, which is an essential part of learning health system. To the best of our knowledge, this is the first application of federated machine learning in clinical NLP.

2018

pdf bib
Learning Patient Representations from Text
Dmitriy Dligach | Timothy Miller
Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics

Mining electronic health records for patients who satisfy a set of predefined criteria is known in medical informatics as phenotyping. Phenotyping has numerous applications such as outcome prediction, clinical trial recruitment, and retrospective studies. Supervised machine learning for phenotyping typically relies on sparse patient representations such as bag-of-words. We consider an alternative that involves learning patient representations. We develop a neural network model for learning patient representations and show that the learned representations are general enough to obtain state-of-the-art performance on a standard comorbidity detection task.

pdf bib
Self-training improves Recurrent Neural Networks performance for Temporal Relation Extraction
Chen Lin | Timothy Miller | Dmitriy Dligach | Hadi Amiri | Steven Bethard | Guergana Savova
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

Neural network models are oftentimes restricted by limited labeled instances and resort to advanced architectures and features for cutting edge performance. We propose to build a recurrent neural network with multiple semantically heterogeneous embeddings within a self-training framework. Our framework makes use of labeled, unlabeled, and social media data, operates on basic features, and is scalable and generalizable. With this method, we establish the state-of-the-art result for both in- and cross-domain for a clinical temporal relation extraction task.

2017

pdf bib
Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks
Chen Lin | Timothy Miller | Dmitriy Dligach | Steven Bethard | Guergana Savova
BioNLP 2017

Token sequences are often used as the input for Convolutional Neural Networks (CNNs) in natural language processing. However, they might not be an ideal representation for time expressions, which are long, highly varied, and semantically complex. We describe a method for representing time expressions with single pseudo-tokens for CNNs. With this method, we establish a new state-of-the-art result for a clinical temporal relation extraction task.

pdf bib
Neural Temporal Relation Extraction
Dmitriy Dligach | Timothy Miller | Chen Lin | Steven Bethard | Guergana Savova
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We experiment with neural architectures for temporal relation extraction and establish a new state-of-the-art for several scenarios. We find that neural models with only tokens as input outperform state-of-the-art hand-engineered feature-based models, that convolutional neural networks outperform LSTM models, and that encoding relation arguments with XML tags outperforms a traditional position-based encoding.

2016

pdf bib
Unsupervised Document Classification with Informed Topic Models
Timothy Miller | Dmitriy Dligach | Guergana Savova
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

pdf bib
Improving Temporal Relation Extraction with Training Instance Augmentation
Chen Lin | Timothy Miller | Dmitriy Dligach | Steven Bethard | Guergana Savova
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

2015

pdf bib
Extracting Time Expressions from Clinical Text
Timothy Miller | Steven Bethard | Dmitriy Dligach | Chen Lin | Guergana Savova
Proceedings of BioNLP 15

2014

pdf bib
Descending-Path Convolution Kernel for Syntactic Structures
Chen Lin | Timothy Miller | Alvin Kho | Steven Bethard | Dmitriy Dligach | Sameer Pradhan | Guergana Savova
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Discovering Temporal Narrative Containers in Clinical Text
Timothy Miller | Steven Bethard | Dmitriy Dligach | Sameer Pradhan | Chen Lin | Guergana Savova
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

pdf bib
Active Learning for Phenotyping Tasks
Dmitriy Dligach | Timothy Miller | Guergana Savova
Proceedings of the Workshop on NLP for Medicine and Biology associated with RANLP 2013

2012

pdf bib
Active Learning for Coreference Resolution
Timothy Miller | Dmitriy Dligach | Guergana Savova
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

2011

pdf bib
Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
Dmitriy Dligach | Martha Palmer
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
VerbNet Class Assignment as a WSD Task
Susan Windisch Brown | Dmitriy Dligach | Martha Palmer
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)

pdf bib
Reducing the Need for Double Annotation
Dmitriy Dligach | Martha Palmer
Proceedings of the 5th Linguistic Annotation Workshop

2010

pdf bib
SemEval-2010 Task 14: Word Sense Induction &Disambiguation
Suresh Manandhar | Ioannis Klapaftis | Dmitriy Dligach | Sameer Pradhan
Proceedings of the 5th International Workshop on Semantic Evaluation

pdf bib
To Annotate More Accurately or to Annotate More
Dmitriy Dligach | Rodney Nielsen | Martha Palmer
Proceedings of the Fourth Linguistic Annotation Workshop

2009

pdf bib
Using Language Modeling to Select Useful Annotation Data
Dmitriy Dligach | Martha Palmer
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium

2008

pdf bib
Novel Semantic Features for Verb Sense Disambiguation
Dmitriy Dligach | Martha Palmer
Proceedings of ACL-08: HLT, Short Papers

2007

pdf bib
SemEval-2007 Task-17: English Lexical Sample, SRL and All Words
Sameer Pradhan | Edward Loper | Dmitriy Dligach | Martha Palmer
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
Criteria for the Manual Grouping of Verb Senses
Cecily Jill Duffield | Jena D. Hwang | Susan Windisch Brown | Dmitriy Dligach | Sarah E. Vieweg | Jenny Davis | Martha Palmer
Proceedings of the Linguistic Annotation Workshop