Elisa Ferracane

2024

pdf abs
Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends
Sanjana Ramprasad | Elisa Ferracane | Zachary Lipton
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advancements in large language models (LLMs) have significantly advanced the capabilities of summarization systems.However, they continue to face a persistent challenge: hallucination. While prior work has extensively examined LLMs in news domains, evaluation of dialogue summarization has primarily focused on BART-based models, resulting in a notable gap in understanding LLM effectiveness.Our work seeks to address this gap by benchmarking LLMs for dialogue summarization faithfulness using human annotations,focusing on identifying and categorizing span-level inconsistencies.Specifically, we evaluate two prominent LLMs: GPT-4 and Alpaca-13B.Our evaluation reveals that LLMs often generate plausible, but not fully supported inferences based on conversation contextual cues, a trait absent in older models. As a result, we propose a refined taxonomy of errors, introducing a novel category termed “Contextual Inference” to address this aspect of LLM behavior. Using our taxonomy, we compare the behavioral differences between LLMs and older fine-tuned models. Additionally, we systematically assess the efficacy of automatic error detection methods on LLM summaries and find that they struggle to detect these nuanced errors effectively. To address this, we introduce two prompt-based approaches for fine-grained error detection. Our methods outperform existing metrics, particularly in identifying the novel “Contextual Inference” error type.

2021

pdf abs
Did they answer? Subjective acts and intents in conversational discourse
Elisa Ferracane | Greg Durrett | Junyi Jessy Li | Katrin Erk
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Discourse signals are often implicit, leaving it up to the interpreter to draw the required inferences. At the same time, discourse is embedded in a social context, meaning that interpreters apply their own assumptions and beliefs when resolving these inferences, leading to multiple, valid interpretations. However, current discourse data and frameworks ignore the social aspect, expecting only a single ground truth. We present the first discourse dataset with multiple and subjective interpretations of English conversation in the form of perceived conversation acts and intents. We carefully analyze our dataset and create computational models to (1) confirm our hypothesis that taking into account the bias of the interpreters leads to better predictions of the interpretations, (2) and show disagreements are nuanced and require a deeper understanding of the different contextual factors. We share our dataset and code at http://github.com/elisaF/subjective_discourse.

2019

pdf abs
Evaluating Discourse in Structured Text Representations
Elisa Ferracane | Greg Durrett | Junyi Jessy Li | Katrin Erk
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Discourse structure is integral to understanding a text and is helpful in many NLP tasks. Learning latent representations of discourse is an attractive alternative to acquiring expensive labeled discourse data. Liu and Lapata (2018) propose a structured attention mechanism for text classification that derives a tree over a text, akin to an RST discourse tree. We examine this model in detail, and evaluate on additional discourse-relevant tasks and datasets, in order to assess whether the structured attention improves performance on the end task and whether it captures a text’s discourse structure. We find the learned latent trees have little to no structure and instead focus on lexical cues; even after obtaining more structured trees with proposed model modifications, the trees are still far from capturing discourse structure when compared to discourse dependency trees from an existing discourse parser. Finally, ablation studies show the structured attention provides little benefit, sometimes even hurting performance.

pdf abs
From News to Medical: Cross-domain Discourse Segmentation
Elisa Ferracane | Titan Page | Junyi Jessy Li | Katrin Erk
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

The first step in discourse analysis involves dividing a text into segments. We annotate the first high-quality small-scale medical corpus in English with discourse segments and analyze how well news-trained segmenters perform on this domain. While we expectedly find a drop in performance, the nature of the segmentation errors suggests some problems can be addressed earlier in the pipeline, while others would require expanding the corpus to a trainable size to learn the nuances of the medical domain.

2017

pdf abs
Leveraging Discourse Information Effectively for Authorship Attribution
Elisa Ferracane | Su Wang | Raymond Mooney
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

We explore techniques to maximize the effectiveness of discourse information in the task of authorship attribution. We present a novel method to embed discourse features in a Convolutional Neural Network text classifier, which achieves a state-of-the-art result by a significant margin. We empirically investigate several featurization methods to understand the conditions under which discourse features contribute non-trivial performance gains, and analyze discourse embeddings.