2024
pdf
abs
Is That the Right Dose? Investigating Generative Language Model Performance on Veterinary Prescription Text Analysis
Brian Hur
|
Lucy Lu Wang
|
Laura Hardefeldt
|
Meliha Yetisgen
Proceedings of the 23rd Workshop on Biomedical Natural Language Processing
Optimizing antibiotic dosing recommendations is a vital aspect of antimicrobial stewardship (AMS) programs aimed at combating antimicrobial resistance (AMR), a significant public health concern, where inappropriate dosing contributes to the selection of AMR pathogens. A key challenge is the extraction of dosing information, which is embedded in free-text clinical records and necessitates numerical transformations. This paper assesses the utility of Large Language Models (LLMs) in extracting essential prescription attributes such as dose, duration, active ingredient, and indication. We evaluate methods to optimize LLMs on this task against a baseline BERT-based ensemble model. Our findings reveal that LLMs can achieve exceptional accuracy by combining probabilistic predictions with deterministic calculations, enforced through functional prompting, to ensure data types and execute necessary arithmetic. This research demonstrates new prospects for automating aspects of AMS when no training data is available.
pdf
abs
Overview of the MEDIQA-M3G 2024 Shared Task on Multilingual Multimodal Medical Answer Generation
Wen-wai Yim
|
Asma Ben Abacha
|
Yujuan Fu
|
Zhaoyi Sun
|
Fei Xia
|
Meliha Yetisgen
|
Martin Krallinger
Proceedings of the 6th Clinical Natural Language Processing Workshop
Remote patient care provides opportunities for expanding medical access, saving healthcare costs, and offering on-demand convenient services. In the MEDIQA-M3G 2024 Shared Task, researchers explored solutions for the specific task of dermatological consumer health visual question answering, where user generated queries and images are used as input and a free-text answer response is generated as output. In this novel challenge, eight teams with a total of 48 submissions were evaluated across three language test sets. In this work, we provide a summary of the dataset, as well as results and approaches. We hope that the insights learned here will inspire future research directions that can lead to technology that deburdens clinical workload and improves care.
pdf
abs
Overview of the MEDIQA-CORR 2024 Shared Task on Medical Error Detection and Correction
Asma Ben Abacha
|
Wen-wai Yim
|
Yujuan Fu
|
Zhaoyi Sun
|
Fei Xia
|
Meliha Yetisgen
Proceedings of the 6th Clinical Natural Language Processing Workshop
Automatic detection and correction of medical errors enables a more rigorous validation of medical documentation as well as clinical notes generated by large language models. Such solutions can ensure the accuracy and medical coherence of clinical texts and enhance patient care and health outcomes. The MEDIQA-CORR 2024 shared task focused on detecting and correcting different types of medical errors in clinical texts. Seventeen teams participated in the shared task and experimented with a broad range of approaches and models. In this paper, we describe the MEDIQA-CORR task, datasets, and the participants’ results and methods.
pdf
abs
A Novel Corpus of Annotated Medical Imaging Reports and Information Extraction Results Using BERT-based Language Models
Namu Park
|
Kevin Lybarger
|
Giridhar Kaushik Ramachandran
|
Spencer Lewis
|
Aashka Damani
|
Özlem Uzuner
|
Martin Gunn
|
Meliha Yetisgen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Medical imaging is critical to the diagnosis, surveillance, and treatment of many health conditions, including oncological, neurological, cardiovascular, and musculoskeletal disorders, among others. Radiologists interpret these complex, unstructured images and articulate their assessments through narrative reports that remain largely unstructured. This unstructured narrative must be converted into a structured semantic representation to facilitate secondary applications such as retrospective analyses or clinical decision support. Here, we introduce the Corpus of Annotated Medical Imaging Reports (CAMIR), which includes 609 annotated radiology reports from three imaging modality types: Computed Tomography, Magnetic Resonance Imaging, and Positron Emission Tomography-Computed Tomography. Reports were annotated using an event-based schema that captures clinical indications, lesions, and medical problems. Each event consists of a trigger and multiple arguments, and a majority of the argument types, including anatomy, normalize the spans to pre-defined concepts to facilitate secondary use. CAMIR uniquely combines a granular event structure and concept normalization. To extract CAMIR events, we explored two BERT (Bi-directional Encoder Representation from Transformers)-based architectures, including an existing architecture (mSpERT) that jointly extracts all event information and a multi-step approach (PL-Marker++) that we augmented for the CAMIR schema.
pdf
abs
Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods
Yujuan Fu
|
Giridhar Kaushik Ramachandran
|
Nicholas J. Dobbins
|
Namu Park
|
Michael Leu
|
Abby R. Rosenberg
|
Kevin Lybarger
|
Fei Xia
|
Özlem Uzuner
|
Meliha Yetisgen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Social determinants of health (SDoH) play a critical role in shaping health outcomes, particularly in pediatric populations where interventions can have long-term implications. SDoH are frequently studied in the Electronic Health Record (EHR), which provides a rich repository for diverse patient data. In this work, we present a novel annotated corpus, the Pediatric Social History Annotation Corpus (PedSHAC), and evaluate the automatic extraction of detailed SDoH representations using fine-tuned and in-context learning methods with Large Language Models (LLMs). PedSHAC comprises annotated social history sections from 1,260 clinical notes obtained from pediatric patients within the University of Washington (UW) hospital system. Employing an event-based annotation scheme, PedSHAC captures ten distinct health determinants to encompass living and economic stability, prior trauma, education access, substance use history, and mental health with an overall annotator agreement of 81.9 F1. Our proposed fine-tuning LLM-based extractors achieve high performance at 78.4 F1 for event arguments. In-context learning approaches with GPT-4 demonstrate promise for reliable SDoH extraction with limited annotated examples, with extraction performance at 82.3 F1 for event triggers.
pdf
abs
To Err Is Human, How about Medical Large Language Models? Comparing Pre-trained Language Models for Medical Assessment Errors and Reliability
Wen-wai Yim
|
Yujuan Fu
|
Asma Ben Abacha
|
Meliha Yetisgen
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Unpredictability, especially unpredictability with unknown error characteristics, is a highly undesirable trait, particularly in medical patient care applications. Although large pre-trained language models (LLM) have been applied to a variety of unseen tasks with highly competitive and successful results, their sensitivity to language inputs and resulting performance variability is not well-studied. In this work, we test state-of-the-art pre-trained language models from a variety of families to characterize their error generation and reliability in medical assessment ability. Particularly, we experiment with general medical assessment multiple choice tests, as well as their open-ended and true-false alternatives. We also profile model consistency, error agreements with each other and to humans; and finally, quantify their ability to recover and explain errors. The findings in this work can be used to give further information about medical models so that modelers can make better-informed decisions rather than relying on standalone performance metrics alone.
2023
pdf
abs
Building blocks for complex tasks: Robust generative event extraction for radiology reports under domain shifts
Sitong Zhou
|
Meliha Yetisgen
|
Mari Ostendorf
Proceedings of the 5th Clinical Natural Language Processing Workshop
This paper explores methods for extracting information from radiology reports that generalize across exam modalities to reduce requirements for annotated data. We demonstrate that multi-pass T5-based text-to-text generative models exhibit better generalization across exam modalities compared to approaches that employ BERT-based task-specific classification layers. We then develop methods that reduce the inference cost of the model, making large-scale corpus processing more feasible for clinical applications. Specifically, we introduce a generative technique that decomposes complex tasks into smaller subtask blocks, which improves a single-pass model when combined with multitask training. In addition, we leverage target-domain contexts during inference to enhance domain adaptation, enabling use of smaller models. Analyses offer insights into the benefits of different cost reduction strategies.
pdf
abs
Prompt-based Extraction of Social Determinants of Health Using Few-shot Learning
Giridhar Kaushik Ramachandran
|
Yujuan Fu
|
Bin Han
|
Kevin Lybarger
|
Nic Dobbins
|
Ozlem Uzuner
|
Meliha Yetisgen
Proceedings of the 5th Clinical Natural Language Processing Workshop
Social determinants of health (SDOH) documented in the electronic health record through unstructured text are increasingly being studied to understand how SDOH impacts patient health outcomes. In this work, we utilize the Social History Annotation Corpus (SHAC), a multi-institutional corpus of de-identified social history sections annotated for SDOH, including substance use, employment, and living status information. We explore the automatic extraction of SDOH information with SHAC in both standoff and inline annotation formats using GPT-4 in a one-shot prompting setting. We compare GPT-4 extraction performance with a high-performing supervised approach and perform thorough error analyses. Our prompt-based GPT-4 method achieved an overall 0.652 F1 on the SHAC test set, similar to the 7th best-performing system among all teams in the n2c2 challenge with SHAC.
pdf
abs
Overview of the MEDIQA-Chat 2023 Shared Tasks on the Summarization & Generation of Doctor-Patient Conversations
Asma Ben Abacha
|
Wen-wai Yim
|
Griffin Adams
|
Neal Snider
|
Meliha Yetisgen
Proceedings of the 5th Clinical Natural Language Processing Workshop
Automatic generation of clinical notes from doctor-patient conversations can play a key role in reducing daily doctors’ workload and improving their interactions with the patients. MEDIQA-Chat 2023 aims to advance and promote research on effective solutions through shared tasks on the automatic summarization of doctor-patient conversations and on the generation of synthetic dialogues from clinical notes for data augmentation. Seventeen teams participated in the challenge and experimented with a broad range of approaches and models. In this paper, we describe the three MEDIQA-Chat 2023 tasks, the datasets, and the participants’ results and methods. We hope that these shared tasks will lead to additional research efforts and insights on the automatic generation and evaluation of clinical notes.
2021
pdf
bib
abs
Towards Automating Medical Scribing : Clinic Visit Dialogue2Note Sentence Alignment and Snippet Summarization
Wen-wai Yim
|
Meliha Yetisgen
Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations
Medical conversations from patient visits are routinely summarized into clinical notes for documentation of clinical care. The automatic creation of clinical note is particularly challenging given that it requires summarization over spoken language and multiple speaker turns; as well, clinical notes include highly technical semi-structured text. In this paper, we describe our corpus creation method and baseline systems for two NLP tasks, clinical dialogue2note sentence alignment and clinical dialogue2note snippet summarization. These two systems, as well as other models created from such a corpus, may be incorporated as parts of an overall end-to-end clinical note generation system.
2020
pdf
abs
Alignment Annotation for Clinic Visit Dialogue to Clinical Note Sentence Language Generation
Wen-wai Yim
|
Meliha Yetisgen
|
Jenny Huang
|
Micah Grossman
Proceedings of the Twelfth Language Resources and Evaluation Conference
For every patient’s visit to a clinician, a clinical note is generated documenting their medical conversation, including complaints discussed, treatments, and medical plans. Despite advances in natural language processing, automating clinical note generation from a clinic visit conversation is a largely unexplored area of research. Due to the idiosyncrasies of the task, traditional methods of corpus creation are not effective enough approaches for this problem. In this paper, we present an annotation methodology that is content- and technique- agnostic while associating note sentences to sets of dialogue sentences. The sets can further be grouped with higher order tags to mark sets with related information. This direct linkage from input to output decouples the annotation from specific language understanding or generation strategies. Here we provide data statistics and qualitative analysis describing the unique annotation challenges. Given enough annotated data, such a resource would support multiple modeling methods including information extraction with template language generation, information retrieval type language generation, or sequence to sequence modeling.
2017
pdf
abs
Clinical Event Detection with Hybrid Neural Architecture
Adyasha Maharana
|
Meliha Yetisgen
BioNLP 2017
Event detection from clinical notes has been traditionally solved with rule based and statistical natural language processing (NLP) approaches that require extensive domain knowledge and feature engineering. In this paper, we have explored the feasibility of approaching this task with recurrent neural networks, clinical word embeddings and introduced a hybrid architecture to improve detection for entities with smaller representation in the dataset. A comparative analysis is also done which reveals the complementary behavior of neural networks and conditional random fields in clinical entity detection.
2016
pdf
abs
Annotating and Detecting Medical Events in Clinical Notes
Prescott Klassen
|
Fei Xia
|
Meliha Yetisgen
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Early detection and treatment of diseases that onset after a patient is admitted to a hospital, such as pneumonia, is critical to improving and reducing costs in healthcare. Previous studies (Tepper et al., 2013) showed that change-of-state events in clinical notes could be important cues for phenotype detection. In this paper, we extend the annotation schema proposed in (Klassen et al., 2014) to mark change-of-state events, diagnosis events, coordination, and negation. After we have completed the annotation, we build NLP systems to automatically identify named entities and medical events, which yield an f-score of 94.7% and 91.8%, respectively.
2015
pdf
bib
In-depth annotation for patient level liver cancer staging
Wen-wai Yim
|
Sharon Kwan
|
Meliha Yetisgen
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis
pdf
Annotation of Clinically Important Follow-up Recommendations in Radiology Reports
Meliha Yetisgen
|
Prescott Klassen
|
Lucas McCarthy
|
Elena Pellicer
|
Tom Payne
|
Martin Gunn
Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis
2014
pdf
abs
Annotating Clinical Events in Text Snippets for Phenotype Detection
Prescott Klassen
|
Fei Xia
|
Lucy Vanderwende
|
Meliha Yetisgen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Early detection and treatment of diseases that onset after a patient is admitted to a hospital, such as pneumonia, is critical to improving and reducing costs in healthcare. NLP systems that analyze the narrative data embedded in clinical artifacts such as x-ray reports can help support early detection. In this paper, we consider the importance of identifying the change of state for events - in particular, clinical events that measure and compare the multiple states of a patients health across time. We propose a schema for event annotation comprised of five fields and create preliminary annotation guidelines for annotators to apply the schema. We then train annotators, measure their performance, and finalize our guidelines. With the complete guidelines, we then annotate a corpus of snippets extracted from chest x-ray reports in order to integrate the annotations as a new source of features for classification tasks.
pdf
bib
Biomedical/Clinical NLP
Ozlem Uzuner
|
Meliha Yetişgen
|
Amber Stubbs
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Tutorial Abstracts
2013
pdf
Annotating Change of State for Clinical Events
Lucy Vanderwende
|
Fei Xia
|
Meliha Yetisgen-Yildiz
Workshop on Events: Definition, Detection, Coreference, and Representation
pdf
bib
Identification of Patients with Acute Lung Injury from Free-Text Chest X-Ray Reports
Meliha Yetisgen-Yildiz
|
Cosmin Bejan
|
Mark Wurfel
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing
2012
pdf
abs
Statistical Section Segmentation in Free-Text Clinical Records
Michael Tepper
|
Daniel Capurro
|
Fei Xia
|
Lucy Vanderwende
|
Meliha Yetisgen-Yildiz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Automatically segmenting and classifying clinical free text into sections is an important first step to automatic information retrieval, information extraction and data mining tasks, as it helps to ground the significance of the text within. In this work we describe our approach to automatic section segmentation of clinical records such as hospital discharge summaries and radiology reports, along with section classification into pre-defined section categories. We apply machine learning to the problems of section segmentation and section classification, comparing a joint (one-step) and a pipeline (two-step) approach. We demonstrate that our systems perform well when tested on three data sets, two for hospital discharge summaries and one for radiology reports. We then show the usefulness of section information by incorporating it in the task of extracting comorbidities from discharge summaries.
2010
pdf
Annotating Large Email Datasets for Named Entity Recognition with Mechanical Turk
Nolan Lawson
|
Kevin Eustice
|
Mike Perkowitz
|
Meliha Yetisgen-Yildiz
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk
pdf
Preliminary Experiments with Amazon’s Mechanical Turk for Annotating Medical Named Entities
Meliha Yetisgen-Yildiz
|
Imre Solti
|
Fei Xia
|
Scott Halgrim
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk