Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations

Chaitanya Shivade, Rashmi Gangadharaiah, Spandana Gella, Sandeep Konam, Shaoqing Yuan, Yi Zhang, Parminder Bhatia, Byron Wallace (Editors)


Anthology ID:
2021.nlpmc-1
Month:
June
Year:
2021
Address:
Online
Venue:
NLPMC
SIG:
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/2021.nlpmc-1
DOI:
Bib Export formats:
BibTeX
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2021.nlpmc-1.pdf

pdf bib
Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations
Chaitanya Shivade | Rashmi Gangadharaiah | Spandana Gella | Sandeep Konam | Shaoqing Yuan | Yi Zhang | Parminder Bhatia | Byron Wallace

pdf bib
Would you like to tell me more? Generating a corpus of psychotherapy dialogues
Seyed Mahed Mousavi | Alessandra Cervone | Morena Danieli | Giuseppe Riccardi

The acquisition of a dialogue corpus is a key step in the process of training a dialogue model. In this context, corpora acquisitions have been designed either for open-domain information retrieval or slot-filling (e.g. restaurant booking) tasks. However, there has been scarce research in the problem of collecting personal conversations with users over a long period of time. In this paper we focus on the types of dialogues that are required for mental health applications. One of these types is the follow-up dialogue that a psychotherapist would initiate in reviewing the progress of a Cognitive Behavioral Therapy (CBT) intervention. The elicitation of the dialogues is achieved through textual stimuli presented to dialogue writers. We propose an automatic algorithm that generates textual stimuli from personal narratives collected during psychotherapy interventions. The automatically generated stimuli are presented as a seed to dialogue writers following principled guidelines. We analyze the linguistic quality of the collected corpus and compare the performances of psychotherapists and non-expert dialogue writers. Moreover, we report the human evaluation of a corpus-based response-selection model.

pdf bib
Towards Automating Medical Scribing : Clinic Visit Dialogue2Note Sentence Alignment and Snippet Summarization
Wen-wai Yim | Meliha Yetisgen

Medical conversations from patient visits are routinely summarized into clinical notes for documentation of clinical care. The automatic creation of clinical note is particularly challenging given that it requires summarization over spoken language and multiple speaker turns; as well, clinical notes include highly technical semi-structured text. In this paper, we describe our corpus creation method and baseline systems for two NLP tasks, clinical dialogue2note sentence alignment and clinical dialogue2note snippet summarization. These two systems, as well as other models created from such a corpus, may be incorporated as parts of an overall end-to-end clinical note generation system.

pdf
Gathering Information and Engaging the User ComBot: A Task-Based, Serendipitous Dialog Model for Patient-Doctor Interactions
Anna Liednikova | Philippe Jolivet | Alexandre Durand-Salmon | Claire Gardent

We focus on dialog models in the context of clinical studies where the goal is to help gather, in addition to the close information collected based on a questionnaire, serendipitous information that is medically relevant. To promote user engagement and address this dual goal (collecting both a predefined set of data points and more informal information about the state of the patients), we introduce an ensemble model made of three bots: a task-based, a follow-up and a social bot. We introduce a generic method for developing follow-up bots. We compare different ensemble configurations and we show that the combination of the three bots (i) provides a better basis for collecting information than just the information seeking bot and (ii) collects information in a more user-friendly, more efficient manner that an ensemble model combining the information seeking and the social bot.

pdf
Automatic Speech-Based Checklist for Medical Simulations
Sapir Gershov | Yaniv Ringel | Erez Dvir | Tzvia Tsirilman | Elad Ben Zvi | Sandra Braun | Aeyal Raz | Shlomi Laufer

Medical simulators provide a controlled environment for training and assessing clinical skills. However, as an assessment platform, it requires the presence of an experienced examiner to provide performance feedback, commonly preformed using a task specific checklist. This makes the assessment process inefficient and expensive. Furthermore, this evaluation method does not provide medical practitioners the opportunity for independent training. Ideally, the process of filling the checklist should be done by a fully-aware objective system, capable of recognizing and monitoring the clinical performance. To this end, we have developed an autonomous and a fully automatic speech-based checklist system, capable of objectively identifying and validating anesthesia residents’ actions in a simulation environment. Based on the analyzed results, our system is capable of recognizing most of the tasks in the checklist: F1 score of 0.77 for all of the tasks, and F1 score of 0.79 for the verbal tasks. Developing an audio-based system will improve the experience of a wide range of simulation platforms. Furthermore, in the future, this approach may be implemented in the operation room and emergency room. This could facilitate the development of automatic assistive technologies for these domains.

pdf
Assertion Detection in Clinical Notes: Medical Language Models to the Rescue?
Betty van Aken | Ivana Trajanovska | Amy Siu | Manuel Mayrdorfer | Klemens Budde | Alexander Loeser

In order to provide high-quality care, health professionals must efficiently identify the presence, possibility, or absence of symptoms, treatments and other relevant entities in free-text clinical notes. Such is the task of assertion detection - to identify the assertion class (present, possible, absent) of an entity based on textual cues in unstructured text. We evaluate state-of-the-art medical language models on the task and show that they outperform the baselines in all three classes. As transferability is especially important in the medical domain we further study how the best performing model behaves on unseen data from two other medical datasets. For this purpose we introduce a newly annotated set of 5,000 assertions for the publicly available MIMIC-III dataset. We conclude with an error analysis that reveals situations in which the models still go wrong and points towards future research directions.

pdf
Extracting Appointment Spans from Medical Conversations
Nimshi Venkat Meripo | Sandeep Konam

Extracting structured information from medical conversations can reduce the documentation burden for doctors and help patients follow through with their care plan. In this paper, we introduce a novel task of extracting appointment spans from medical conversations. We frame this task as a sequence tagging problem and focus on extracting spans for appointment reason and time. However, annotating medical conversations is expensive, time-consuming, and requires considerable domain expertise. Hence, we propose to leverage weak supervision approaches, namely incomplete supervision, inaccurate supervision, and a hybrid supervision approach and evaluate both generic and domain-specific, ELMo, and BERT embeddings using sequence tagging models. The best performing model is the domain-specific BERT variant using weak hybrid supervision and obtains an F1 score of 79.32.

pdf
Building blocks of a task-oriented dialogue system in the healthcare domain
Heereen Shim | Dietwig Lowet | Stijn Luca | Bart Vanrumste

There has been significant progress in dialogue systems research. However, dialogue systems research in the healthcare domain is still in its infancy. In this paper, we analyse recent studies and outline three building blocks of a task-oriented dialogue system in the healthcare domain: i) privacy-preserving data collection; ii) medical knowledge-grounded dialogue management; and iii) human-centric evaluations. To this end, we propose a framework for developing a dialogue system and show preliminary results of simulated dialogue data generation by utilising expert knowledge and crowd-sourcing.

pdf
Joint Summarization-Entailment Optimization for Consumer Health Question Understanding
Khalil Mrini | Franck Dernoncourt | Walter Chang | Emilia Farcas | Ndapa Nakashole

Understanding the intent of medical questions asked by patients, or Consumer Health Questions, is an essential skill for medical Conversational AI systems. We propose a novel data-augmented and simple joint learning approach combining question summarization and Recognizing Question Entailment (RQE) in the medical domain. Our data augmentation approach enables to use just one dataset for joint learning. We show improvements on both tasks across four biomedical datasets in accuracy (+8%), ROUGE-1 (+2.5%) and human evaluation scores. Human evaluation shows joint learning generates faithful and informative summaries. Finally, we release our code, the two question summarization datasets extracted from a large-scale medical dialogue dataset, as well as our augmented datasets.

pdf
Medically Aware GPT-3 as a Data Generator for Medical Dialogue Summarization
Bharath Chintagunta | Namit Katariya | Xavier Amatriain | Anitha Kannan

In medical dialogue summarization, summaries must be coherent and must capture all the medically relevant information in the dialogue. However, learning effective models for summarization require large amounts of labeled data which is especially hard to obtain. We present an algorithm to create synthetic training data with an explicit focus on capturing medically relevant information. We utilize GPT-3 as the backbone of our algorithm and scale 210 human labeled examples to yield results comparable to using 6400 human labeled examples (~30x) leveraging low-shot learning and an ensemble method. In detailed experiments, we show that this approach produces high quality training data that can further be combined with human labeled data to get summaries that are strongly preferable to those produced by models trained on human data alone both in terms of medical accuracy and coherency.