2022
pdf
abs
Building a Clinically-Focused Problem List From Medical Notes
Amir Feder
|
Itay Laish
|
Shashank Agarwal
|
Uri Lerner
|
Avel Atias
|
Cathy Cheung
|
Peter Clardy
|
Alon Peled-Cohen
|
Rachana Fellinger
|
Hengrui Liu
|
Lan Huong Nguyen
|
Birju Patel
|
Natan Potikha
|
Amir Taubenfeld
|
Liwen Xu
|
Seung Doo Yang
|
Ayelet Benjamini
|
Avinatan Hassidim
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)
Clinical notes often contain useful information not documented in structured data, but their unstructured nature can lead to critical patient-related information being missed. To increase the likelihood that this valuable information is utilized for patient care, algorithms that summarize notes into a problem list have been proposed. Focused on identifying medically-relevant entities in the free-form text, these solutions are often detached from a canonical ontology and do not allow downstream use of the detected text-spans. Mitigating these issues, we present here a system for generating a canonical problem list from medical notes, consisting of two major stages. At the first stage, annotation, we use a transformer model to detect all clinical conditions which are mentioned in a single note. These clinical conditions are then grounded to a predefined ontology, and are linked to spans in the text. At the second stage, summarization, we develop a novel algorithm that aggregates over the set of clinical conditions detected on all of the patient’s notes, and produce a concise patient summary that organizes their most important conditions.
2021
pdf
abs
Learning and Evaluating a Differentially Private Pre-trained Language Model
Shlomo Hoory
|
Amir Feder
|
Avichai Tendler
|
Sofia Erell
|
Alon Peled-Cohen
|
Itay Laish
|
Hootan Nakhost
|
Uri Stemmer
|
Ayelet Benjamini
|
Avinatan Hassidim
|
Yossi Matias
Findings of the Association for Computational Linguistics: EMNLP 2021
Contextual language models have led to significantly better results, especially when pre-trained on the same data as the downstream task. While this additional pre-training usually improves performance, it can lead to information leakage and therefore risks the privacy of individuals mentioned in the training data. One method to guarantee the privacy of such individuals is to train a differentially-private language model, but this usually comes at the expense of model performance. Also, in the absence of a differentially private vocabulary training, it is not possible to modify the vocabulary to fit the new data, which might further degrade results. In this work we bridge these gaps, and provide guidance to future researchers and practitioners on how to improve privacy while maintaining good model performance. We introduce a novel differentially private word-piece algorithm, which allows training a tailored domain-specific vocabulary while maintaining privacy. We then experiment with entity extraction tasks from clinical notes, and demonstrate how to train a differentially private pre-trained language model (i.e., BERT) with a privacy guarantee of 𝜖=1.1 and with only a small degradation in performance. Finally, as it is hard to tell given a privacy parameter 𝜖 what was the effect on the trained representation, we present experiments showing that the trained model does not memorize private information.