Izhak Shafran


2022

pdf
Unsupervised Slot Schema Induction for Task-oriented Dialog
Dian Yu | Mingqiu Wang | Yuan Cao | Izhak Shafran | Laurent Shafey | Hagen Soltau
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Carefully-designed schemas describing how to collect and annotate dialog corpora are a prerequisite towards building task-oriented dialog systems. In practical applications, manually designing schemas can be error-prone, laborious, iterative, and slow, especially when the schema is complicated. To alleviate this expensive and time consuming process, we propose an unsupervised approach for slot schema induction from unlabeled dialog corpora. Leveraging in-domain language models and unsupervised parsing structures, our data-driven approach extracts candidate slots without constraints, followed by coarse-to-fine clustering to induce slot types. We compare our method against several strong supervised baselines, and show significant performance improvement in slot schema induction on MultiWoz and SGD datasets. We also demonstrate the effectiveness of induced schemas on downstream applications including dialog state tracking and response generation.

pdf
Knowledge-grounded Dialog State Tracking
Dian Yu | Mingqiu Wang | Yuan Cao | Laurent El Shafey | Izhak Shafran | Hagen Soltau
Findings of the Association for Computational Linguistics: EMNLP 2022

Knowledge (including structured knowledge such as schema and ontology and unstructured knowledge such as web corpus) is a critical part of dialog understanding, especially for unseen tasks and domains. Traditionally, such domain-specific knowledge is encoded implicitly into model parameters for the execution of downstream tasks, which makes training inefficient. In addition , such models are not easily transferable to new tasks with different schemas. In this work, we propose to perform dialog state tracking grounded on knowledge encoded externally. We query relevant knowledge of various forms based on the dialog context where such information can grounds the prediction of dialog states. We demonstrate superior performance of our proposed method over strong baselines, especially in the few-shot learning setting.

2020

pdf bib
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations
Parminder Bhatia | Steven Lin | Rashmi Gangadharaiah | Byron Wallace | Izhak Shafran | Chaitanya Shivade | Nan Du | Mona Diab
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations

pdf
The Medical Scribe: Corpus Development and Model Performance Analyses
Izhak Shafran | Nan Du | Linh Tran | Amanda Perry | Lauren Keyes | Mark Knichel | Ashley Domin | Lei Huang | Yu-hui Chen | Gang Li | Mingqiu Wang | Laurent El Shafey | Hagen Soltau | Justin Stuart Paul
Proceedings of the Twelfth Language Resources and Evaluation Conference

There is a growing interest in creating tools to assist in clinical note generation using the audio of provider-patient encounters. Motivated by this goal and with the help of providers and medical scribes, we developed an annotation scheme to extract relevant clinical concepts. We used this annotation scheme to label a corpus of about 6k clinical encounters. This was used to train a state-of-the-art tagging model. We report ontologies, labeling results, model performances, and detailed analyses of the results. Our results show that the entities related to medications can be extracted with a relatively high accuracy of 0.90 F-score, followed by symptoms at 0.72 F-score, and conditions at 0.57 F-score. In our task, we not only identify where the symptoms are mentioned but also map them to canonical forms as they appear in the clinical notes. Of the different types of errors, in about 19-38% of the cases, we find that the model output was correct, and about 17-32% of the errors do not impact the clinical note. Taken together, the models developed in this work are more useful than the F-scores reflect, making it a promising approach for practical applications.

2019

pdf
Audio De-identification - a New Entity Recognition Task
Ido Cohn | Itay Laish | Genady Beryozkin | Gang Li | Izhak Shafran | Idan Szpektor | Tzvika Hartman | Avinatan Hassidim | Yossi Matias
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. The application of NER in the context of audio de-identification has yet to be fully investigated. To this end, we define the task of audio de-ID, in which audio spans with entity mentions should be detected. We then present our pipeline for this task, which involves Automatic Speech Recognition (ASR), NER on the transcript text, and text-to-audio alignment. Finally, we introduce a novel metric for audio de-ID and a new evaluation benchmark consisting of a large labeled segment of the Switchboard and Fisher audio datasets and detail our pipeline’s results on it.

pdf
Extracting Symptoms and their Status from Clinical Conversations
Nan Du | Kai Chen | Anjuli Kannan | Linh Tran | Yuhui Chen | Izhak Shafran
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

This paper describes novel models tailored for a new application, that of extracting the symptoms mentioned in clinical conversations along with their status. Lack of any publicly available corpus in this privacy-sensitive domain led us to develop our own corpus, consisting of about 3K conversations annotated by professional medical scribes. We propose two novel deep learning approaches to infer the symptom names and their status: (1) a new hierarchical span-attribute tagging (SA-T) model, trained using curriculum learning, and (2) a variant of sequence-to-sequence model which decodes the symptoms and their status from a few speaker turns within a sliding window over the conversation. This task stems from a realistic application of assisting medical providers in capturing symptoms mentioned by patients from their clinical conversations. To reflect this application, we define multiple metrics. From inter-rater agreement, we find that the task is inherently difficult. We conduct comprehensive evaluations on several contrasting conditions and observe that the performance of the models range from an F-score of 0.5 to 0.8 depending on the condition. Our analysis not only reveals the inherent challenges of the task, but also provides useful directions to improve the models.

pdf
Learning to Infer Entities, Properties and their Relations from Clinical Conversations
Nan Du | Mingqiu Wang | Linh Tran | Gang Lee | Izhak Shafran
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Recently we proposed the Span Attribute Tagging (SAT) Model to infer clinical entities (e.g., symptoms) and their properties (e.g., duration). It tackles the challenge of large label space and limited training data using a hierarchical two-stage approach that identifies the span of interest in a tagging step and assigns labels to the span in a classification step. We extend the SAT model to jointly infer not only entities and their properties but also relations between them. Most relation extraction models restrict inferring relations between tokens within a few neighboring sentences, mainly to avoid high computational complexity. In contrast, our proposed Relation-SAT (R-SAT) model is computationally efficient and can infer relations over the entire conversation, spanning an average duration of 10 minutes. We evaluate our model on a corpus of clinical conversations. When the entities are given, the R-SAT outperforms baselines in identifying relations between symptoms and their properties by about 32% (0.82 vs 0.62 F-score) and by about 50% (0.60 vs 0.41 F-score) on medications and their properties. On the more difficult task of jointly inferring entities and relations, the R-SAT model achieves a performance of 0.34 and 0.45 for symptoms and medications respectively, which is significantly better than 0.18 and 0.35 for the baseline model. The contributions of different components of the model are quantified using ablation analysis.

2014

pdf
Detecting Health Related Discussions in Everyday Telephone Conversations for Studying Medical Events in the Lives of Older Adults
Golnar Sheikhshab | Izhak Shafran | Jeffrey Kaye
Proceedings of BioNLP 2014

pdf bib
Applications of Lexicographic Semirings to Problems in Speech and Language Processing
Richard Sproat | Mahsa Yarmohammadi | Izhak Shafran | Brian Roark
Computational Linguistics, Volume 40, Issue 4 - December 2014

2013

pdf
Discriminative Joint Modeling of Lexical Variation and Acoustic Confusion for Automated Narrative Retelling Assessment
Maider Lehr | Izhak Shafran | Emily Prud’hommeaux | Brian Roark
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf
Hello, Who is Calling?: Can Words Reveal the Social Nature of Conversations?
Anthony Stark | Izhak Shafran | Jeffrey Kaye
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Lexicographic Semirings for Exact Automata Encoding of Sequence Models
Brian Roark | Richard Sproat | Izhak Shafran
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2006

pdf
SParseval: Evaluation Metrics for Parsing Speech
Brian Roark | Mary Harper | Eugene Charniak | Bonnie Dorr | Mark Johnson | Jeremy Kahn | Yang Liu | Mari Ostendorf | John Hale | Anna Krasnyanskaya | Matthew Lease | Izhak Shafran | Matthew Snover | Robin Stewart | Lisa Yung
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

While both spoken and written language processing stand to benefit from parsing, the standard Parseval metrics (Black et al., 1991) and their canonical implementation (Sekine and Collins, 1997) are only useful for text. The Parseval metrics are undefined when the words input to the parser do not match the words in the gold standard parse tree exactly, and word errors are unavoidable with automatic speech recognition (ASR) systems. To fill this gap, we have developed a publicly available tool for scoring parses that implements a variety of metrics which can handle mismatches in words and segmentations, including: alignment-based bracket evaluation, alignment-based dependency evaluation, and a dependency evaluation that does not require alignment. We describe the different metrics, how to use the tool, and the outcome of an extensive set of experiments on the sensitivity.

pdf
PCFGs with Syntactic and Prosodic Indicators of Speech Repairs
John Hale | Izhak Shafran | Lisa Yung | Bonnie J. Dorr | Mary Harper | Anna Krasnyanskaya | Matthew Lease | Yang Liu | Brian Roark | Matthew Snover | Robin Stewart
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf
Corrective Models for Speech Recognition of Inflected Languages
Izhak Shafran | Keith Hall
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing