Jeffrey P. Bigham

Also published as: Jeffrey Bigham

2021

pdf bib
Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation
Prakhar Gupta | Yulia Tsvetkov | Jeffrey Bigham
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs
Does Pretraining for Summarization Require Knowledge Transfer?
Kundan Krishna | Jeffrey Bigham | Zachary C. Lipton
Findings of the Association for Computational Linguistics: EMNLP 2021

Pretraining techniques leveraging enormous datasets have driven recent advances in text summarization. While folk explanations suggest that knowledge transfer accounts for pretraining’s benefits, little is known about why it works or what makes a pretraining task or dataset suitable. In this paper, we challenge the knowledge transfer story, showing that pretraining on documents consisting of character n-grams selected at random, we can nearly match the performance of models pretrained on real corpora. This work holds the promise of eliminating upstream corpora, which may alleviate some concerns over offensive language, bias, and copyright issues. To see whether the small residual benefit of using real data could be accounted for by the structure of the pretraining task, we design several tasks motivated by a qualitative study of summarization corpora. However, these tasks confer no appreciable benefit, leaving open the possibility of a small role for knowledge transfer.

pdf bib abs
Controlling Dialogue Generation with Semantic Exemplars
Prakhar Gupta | Jeffrey Bigham | Yulia Tsvetkov | Amy Pavel
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Dialogue systems pretrained with large language models generate locally coherent responses, but lack fine-grained control over responses necessary to achieve specific goals. A promising method to control response generation is exemplar-based generation, in which models edit exemplar responses that are retrieved from training data, or hand-written to strategically address discourse-level goals, to fit new dialogue contexts. We present an Exemplar-based Dialogue Generation model, EDGE, that uses the semantic frames present in exemplar responses to guide response generation. We show that controlling dialogue generation based on the semantic frames of exemplars improves the coherence of generated responses, while preserving semantic meaning and conversation goals present in exemplar responses.

pdf bib abs
Generating SOAP Notes from Doctor-Patient Conversations Using Modular Summarization Techniques
Kundan Krishna | Sopan Khosla | Jeffrey Bigham | Zachary C. Lipton
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Following each patient visit, physicians draft long semi-structured clinical summaries called SOAP notes. While invaluable to clinicians and researchers, creating digital SOAP notes is burdensome, contributing to physician burnout. In this paper, we introduce the first complete pipelines to leverage deep summarization models to generate these notes based on transcripts of conversations between physicians and patients. After exploring a spectrum of methods across the extractive-abstractive spectrum, we propose Cluster2Sent, an algorithm that (i) extracts important utterances relevant to each summary section; (ii) clusters together related utterances; and then (iii) generates one summary sentence per cluster. Cluster2Sent outperforms its purely abstractive counterpart by 8 ROUGE-1 points, and produces significantly more factual and coherent sentences as assessed by expert human evaluators. For reproducibility, we demonstrate similar benefits on the publicly available AMI dataset. Our results speak to the benefits of structuring summaries into sections and annotating supporting evidence when constructing summarization corpora.

2019

pdf bib abs
Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References
Prakhar Gupta | Shikib Mehri | Tiancheng Zhao | Amy Pavel | Maxine Eskenazi | Jeffrey Bigham
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation. Existing metrics have been shown to correlate poorly with human judgement, particularly in open-domain dialog. One alternative is to collect human annotations for evaluation, which can be expensive and time consuming. To demonstrate the effectiveness of multi-reference evaluation, we augment the test set of DailyDialog with multiple references. A series of experiments show that the use of multiple references results in improved correlation between several automatic metrics and human judgement for both the quality and the diversity of system output.

Jeffrey P. Bigham

2021

2019

2013

2006

Co-authors

Venues