Marilyn Walker

Also published as: M. A. Walker, Marilyn A. Walker

2023

pdf abs
Controllable Generation of Dialogue Acts for Dialogue Systems via Few-Shot Response Generation and Ranking
Angela Ramirez | Kartik Agarwal | Juraj Juraska | Utkarsh Garg | Marilyn Walker
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Dialogue systems need to produce responses that realize multiple types of dialogue acts (DAs) with high semantic fidelity. In the past, natural language generators (NLGs) for dialogue were trained on large parallel corpora that map from a domain-specific DA and its semantic attributes to an output utterance. Recent work shows that pretrained language models (LLMs) offer new possibilities for controllable NLG using prompt-based learning. Here we develop a novel few-shot overgenerate-and-rank approach that achieves the controlled generation of DAs. We compare eight few-shot prompt styles that include a novel method of generating from textual pseudo-references using a textual style transfer approach. We develop six automatic ranking functions that identify outputs with both the correct DA and high semantic accuracy at generation time. We test our approach on three domains and four LLMs. To our knowledge, this is the first work on NLG for dialogue that automatically ranks outputs using both DA and attribute accuracy. For completeness, we compare our results to fine-tuned few-shot models trained with 5 to 100 instances per DA. Our results show that several prompt settings achieve perfect DA accuracy, and near perfect semantic accuracy (99.81%) and perform better than few-shot fine-tuning.

2022

pdf abs
OpenEL: An Annotated Corpus for Entity Linking and Discourse in Open Domain Dialogue
Wen Cui | Leanne Rolston | Marilyn Walker | Beth Ann Hockey
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Entity linking in dialogue is the task of mapping entity mentions in utterances to a target knowledge base. Prior work on entity linking has mainly focused on well-written articles such as Wikipedia, annotated newswire, or domain-specific datasets. We extend the study of entity linking to open domain dialogue by presenting the OpenEL corpus: an annotated multi-domain corpus for linking entities in natural conversation to Wikidata. Each dialogic utterance in 179 dialogues over 12 topics from the EDINA dataset has been annotated for entities realized by definite referring expressions as well as anaphoric forms such as he, she, it and they. This dataset supports training and evaluation of entity linking in open-domain dialogue, as well as analysis of the effect of using dialogue context and anaphora resolution in model training. It could also be used for fine-tuning a coreference resolution algorithm. To the best of our knowledge, this is the first substantial entity linking corpus publicly available for open-domain dialogue. We also establish baselines for this task using several existing entity linking systems. We found that the Transformer-based system Flair + BLINK has the best performance with a 0.65 F1 score. Our results show that dialogue context is extremely beneficial for entity linking in conversations, with Flair + Blink achieving an F1 of 0.61 without discourse context. These results also demonstrate the remaining performance gap between the baselines and human performance, highlighting the challenges of entity linking in open-domain dialogue, and suggesting many avenues for future research using OpenEL.

2021

pdf abs
Attention Is Indeed All You Need: Semantically Attention-Guided Decoding for Data-to-Text NLG
Juraj Juraska | Marilyn Walker
Proceedings of the 14th International Conference on Natural Language Generation

Ever since neural models were adopted in data-to-text language generation, they have invariably been reliant on extrinsic components to improve their semantic accuracy, because the models normally do not exhibit the ability to generate text that reliably mentions all of the information provided in the input. In this paper, we propose a novel decoding method that extracts interpretable information from encoder-decoder models’ cross-attention, and uses it to infer which attributes are mentioned in the generated text, which is subsequently used to rescore beam hypotheses. Using this decoding method with T5 and BART, we show on three datasets its ability to dramatically reduce semantic errors in the generated outputs, while maintaining their state-of-the-art quality.

Athena 2.0 is an Alexa Prize SocialBot that has been a finalist in the last two Alexa Prize Grand Challenges. One reason for Athena’s success is its novel dialogue management strategy, which allows it to dynamically construct dialogues and responses from component modules, leading to novel conversations with every interaction. Here we describe Athena’s system design and performance in the Alexa Prize during the 20/21 competition. A live demo of Athena as well as video recordings will provoke discussion on the state of the art in conversational AI.

2020

pdf abs
Bridging the Structural Gap Between Encoding and Decoding for Data-To-Text Generation
Chao Zhao | Marilyn Walker | Snigdha Chaturvedi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Generating sequential natural language descriptions from graph-structured data (e.g., knowledge graph) is challenging, partly because of the structural differences between the input graph and the output text. Hence, popular sequence-to-sequence models, which require serialized input, are not a natural fit for this task. Graph neural networks, on the other hand, can better encode the input graph but broaden the structural gap between the encoder and decoder, making faithful generation difficult. To narrow this gap, we propose DualEnc, a dual encoding model that can not only incorporate the graph structure, but can also cater to the linear structure of the output text. Empirical comparisons with strong single-encoder baselines demonstrate that dual encoding can significantly improve the quality of the generated text.

pdf abs
Learning from Mistakes: Combining Ontologies via Self-Training for Dialogue Generation
Lena Reed | Vrindavan Harrison | Shereen Oraby | Dilek Hakkani-Tur | Marilyn Walker
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Natural language generators (NLGs) for task-oriented dialogue typically take a meaning representation (MR) as input, and are trained end-to-end with a corpus of MR/utterance pairs, where the MRs cover a specific set of dialogue acts and domain attributes. Creation of such datasets is labor intensive and time consuming. Therefore, dialogue systems for new domain ontologies would benefit from using data for pre-existing ontologies. Here we explore, for the first time, whether it is possible to train an NLG for a new larger ontology using existing training sets for the restaurant domain, where each set is based on a different ontology. We create a new, larger combined ontology, and then train an NLG to produce utterances covering it. For example, if one dataset has attributes for family friendly and rating information, and the other has attributes for decor and service, our aim is an NLG for the combined ontology that can produce utterances that realize values for family friendly, rating, decor and service. Initial experiments with a baseline neural sequence-to-sequence model show that this task is surprisingly challenging. We then develop a novel self-training method that identifies (errorful) model outputs, automatically constructs a corrected MR input to form a new (MR, utterance) training pair, and then repeatedly adds these new instances back into the training data. We then test the resulting model on a new test set. The result is a self-trained model whose performance is an absolute 75.4% improvement over the baseline model. We also report a human qualitative evaluation of the final model showing that it achieves high naturalness, semantic coherence and grammaticality.

2019

pdf bib abs
Maximizing Stylistic Control and Semantic Accuracy in NLG: Personality Variation and Discourse Contrast
Vrindavan Harrison | Lena Reed | Shereen Oraby | Marilyn Walker
Proceedings of the 1st Workshop on Discourse Structure in Neural NLG

Neural generation methods for task-oriented dialogue typically generate from a meaning representation that is populated using a database of domain information, such as a table of data describing a restaurant. While earlier work focused solely on the semantic fidelity of outputs, recent work has started to explore methods for controlling the style of the generated text while simultaneously achieving semantic accuracy. Here we experiment with two stylistic benchmark tasks, generating language that exhibits variation in personality, and generating discourse contrast. We report a huge performance improvement in both stylistic control and semantic accuracy over the state of the art on both of these benchmarks. We test several different models and show that putting stylistic conditioning in the decoder and eliminating the semantic re-ranker used in earlier models results in more than 15 points higher BLEU for Personality, with a reduction of semantic error to near zero. We also report an improvement from .75 to .81 in controlling contrast and a reduction in semantic error from 16% to 2%.

pdf abs
ViGGO: A Video Game Corpus for Data-To-Text Generation in Open-Domain Conversation
Juraj Juraska | Kevin Bowden | Marilyn Walker
Proceedings of the 12th International Conference on Natural Language Generation

The uptake of deep learning in natural language generation (NLG) led to the release of both small and relatively large parallel corpora for training neural models. The existing data-to-text datasets are, however, aimed at task-oriented dialogue systems, and often thus limited in diversity and versatility. They are typically crowdsourced, with much of the noise left in them. Moreover, current neural NLG models do not take full advantage of large training data, and due to their strong generalizing properties produce sentences that look template-like regardless. We therefore present a new corpus of 7K samples, which (1) is clean despite being crowdsourced, (2) has utterances of 9 generalizable and conversational dialogue act types, making it more suitable for open-domain dialogue systems, and (3) explores the domain of video games, which is new to dialogue systems despite having excellent potential for supporting rich conversations.

pdf abs
Implicit Discourse Relation Identification for Open-domain Dialogues
Mingyu Derek Ma | Kevin Bowden | Jiaqi Wu | Wen Cui | Marilyn Walker
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Discourse relation identification has been an active area of research for many years, and the challenge of identifying implicit relations remains largely an unsolved task, especially in the context of an open-domain dialogue system. Previous work primarily relies on a corpora of formal text which is inherently non-dialogic, i.e., news and journals. This data however is not suitable to handle the nuances of informal dialogue nor is it capable of navigating the plethora of valid topics present in open-domain dialogue. In this paper, we designed a novel discourse relation identification pipeline specifically tuned for open-domain dialogue systems. We firstly propose a method to automatically extract the implicit discourse relation argument pairs and labels from a dataset of dialogic turns, resulting in a novel corpus of discourse relation pairs; the first of its kind to attempt to identify the discourse relations connecting the dialogic turns in open-domain discourse. Moreover, we have taken the first steps to leverage the dialogue features unique to our task to further improve the identification of such relations by performing feature ablation and incorporating dialogue features to enhance the state-of-the-art model.

pdf abs
Curate and Generate: A Corpus and Method for Joint Control of Semantics and Style in Neural NLG
Shereen Oraby | Vrindavan Harrison | Abteen Ebrahimi | Marilyn Walker
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural natural language generation (NNLG) from structured meaning representations has become increasingly popular in recent years. While we have seen progress with generating syntactically correct utterances that preserve semantics, various shortcomings of NNLG systems are clear: new tasks require new training data which is not available or straightforward to acquire, and model outputs are simple and may be dull and repetitive. This paper addresses these two critical challenges in NNLG by: (1) scalably (and at no cost) creating training datasets of parallel meaning representations and reference texts with rich style markup by using data from freely available and naturally descriptive user reviews, and (2) systematically exploring how the style markup enables joint control of semantic and stylistic aspects of neural model output. We present YelpNLG, a corpus of 300,000 rich, parallel meaning representations and highly stylistically varied reference texts spanning different restaurant attributes, and describe a novel methodology that can be scalably reused to generate NLG datasets for other domains. The experiments show that the models control important aspects, including lexical choice of adjectives, output length, and sentiment, allowing the models to successfully hit multiple style targets without sacrificing semantics.

2018

pdf bib
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Marilyn Walker | Heng Ji | Amanda Stent
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

pdf abs
A Deep Ensemble Model with Slot Alignment for Sequence-to-Sequence Natural Language Generation
Juraj Juraska | Panagiotis Karagiannis | Kevin Bowden | Marilyn Walker
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Natural language generation lies at the core of generative dialogue systems and conversational agents. We describe an ensemble neural language generator, and present several novel methods for data representation and augmentation that yield improved results in our model. We test the model on three datasets in the restaurant, TV and laptop domains, and report both objective and subjective evaluations of our best model. Using a range of automatic metrics, as well as human evaluators, we show that our approach achieves better results than state-of-the-art models on the same datasets.

pdf bib
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
Marilyn Walker | Heng Ji | Amanda Stent
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

pdf abs
Modeling Linguistic and Personality Adaptation for Natural Language Generation
Zhichao Hu | Jean Fox Tree | Marilyn Walker
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Previous work has shown that conversants adapt to many aspects of their partners’ language. Other work has shown that while every person is unique, they often share general patterns of behavior. Theories of personality aim to explain these shared patterns, and studies have shown that many linguistic cues are correlated with personality traits. We propose an adaptation measure for adaptive natural language generation for dialogs that integrates the predictions of both personality theories and adaptation theories, that can be applied as a dialog unfolds, on a turn by turn basis. We show that our measure meets criteria for validity, and that adaptation varies according to corpora and task, speaker, and the set of features used to model it. We also produce fine-grained models according to the dialog segmentation or the speaker, and demonstrate the decaying trend of adaptation.

pdf abs
Controlling Personality-Based Stylistic Variation with Neural Natural Language Generators
Shereen Oraby | Lena Reed | Shubhangi Tandon | Sharath T.S. | Stephanie Lukin | Marilyn Walker
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Natural language generators for task-oriented dialogue must effectively realize system dialogue actions and their associated semantics. In many applications, it is also desirable for generators to control the style of an utterance. To date, work on task-oriented neural generation has primarily focused on semantic fidelity rather than achieving stylistic goals, while work on style has been done in contexts where it is difficult to measure content preservation. Here we present three different sequence-to-sequence models and carefully test how well they disentangle content and style. We use a statistical generator, Personage, to synthesize a new corpus of over 88,000 restaurant domain utterances whose style varies according to models of personality, giving us total control over both the semantic content and the stylistic variation in the training data. We then vary the amount of explicit stylistic supervision given to the three models. We show that our most explicit model can simultaneously achieve high fidelity to both semantic and stylistic goals: this model adds a context vector of 36 stylistic parameters as input to the hidden state of the encoder at each time step, showing the benefits of explicit stylistic supervision, even when the amount of training data is large.

pdf abs
Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?
Lena Reed | Shereen Oraby | Marilyn Walker
Proceedings of the 11th International Conference on Natural Language Generation

Responses in task-oriented dialogue systems often realize multiple propositions whose ultimate form depends on the use of sentence planning and discourse structuring operations. For example a recommendation may consist of an explicitly evaluative utterance e.g. Chanpen Thai is the best option, along with content related by the justification discourse relation, e.g. It has great food and service, that combines multiple propositions into a single phrase. While neural generation methods integrate sentence planning and surface realization in one end-to-end learning framework, previous work has not shown that neural generators can: (1) perform common sentence planning and discourse structuring operations; (2) make decisions as to whether to realize content in a single sentence or over multiple sentences; (3) generalize sentence planning and discourse relation operations beyond what was seen in training. We systematically create large training corpora that exhibit particular sentence planning operations and then test neural models to see what they learn. We compare models without explicit latent variables for sentence planning with ones that provide explicit supervision during training. We show that only the models with additional supervision can reproduce sentence planning and discourse operations and generalize to situations unseen in training.

pdf abs
Neural Generation of Diverse Questions using Answer Focus, Contextual and Linguistic Features
Vrindavan Harrison | Marilyn Walker
Proceedings of the 11th International Conference on Natural Language Generation

Question Generation is the task of automatically creating questions from textual input. In this work we present a new Attentional Encoder–Decoder Recurrent Neural Network model for automatic question generation. Our model incorporates linguistic features and an additional sentence embedding to capture meaning at both sentence and word levels. The linguistic features are designed to capture information related to named entity recognition, word case, and entity coreference resolution. In addition our model uses a copying mechanism and a special answer signal that enables generation of numerous diverse questions on a given sentence. Our model achieves state of the art results of 19.98 Bleu_4 on a benchmark Question Generation dataset, outperforming all previously published results by a significant margin. A human evaluation also shows that the added features improve the quality of the generated questions.

pdf abs
Characterizing Variation in Crowd-Sourced Data for Training Neural Language Generators to Produce Stylistically Varied Outputs
Juraj Juraska | Marilyn Walker
Proceedings of the 11th International Conference on Natural Language Generation

One of the biggest challenges of end-to-end language generation from meaning representations in dialogue systems is making the outputs more natural and varied. Here we take a large corpus of 50K crowd-sourced utterances in the restaurant domain and develop text analysis methods that systematically characterize types of sentences in the training data. We then automatically label the training data to allow us to conduct two kinds of experiments with a neural generator. First, we test the effect of training the system with different stylistic partitions and quantify the effect of smaller, but more stylistically controlled training data. Second, we propose a method of labeling the style variants during training, and show that we can modify the style of the generated utterances using our stylistic labels. We contrast and compare these methods that can be used with any existing large corpus, showing how they vary in terms of semantic quality and stylistic control.

pdf
Exploring Conversational Language Generation for Rich Content about Hotels
Marilyn Walker | Albry Smither | Shereen Oraby | Vrindavan Harrison | Hadar Shemtov
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
SlugNERDS: A Named Entity Recognition Tool for Open Domain Dialogue Systems
Kevin Bowden | Jiaqi Wu | Shereen Oraby | Amita Misra | Marilyn Walker
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Proceedings of the 27th International Conference on Computational Linguistics: Tutorial Abstracts
Donia Scott | Marilyn Walker | Pascale Fung
Proceedings of the 27th International Conference on Computational Linguistics: Tutorial Abstracts

2017

pdf abs
Argument Strength is in the Eye of the Beholder: Audience Effects in Persuasion
Stephanie Lukin | Pranav Anand | Marilyn Walker | Steve Whittaker
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Americans spend about a third of their time online, with many participating in online conversations on social and political issues. We hypothesize that social media arguments on such issues may be more engaging and persuasive than traditional media summaries, and that particular types of people may be more or less convinced by particular styles of argument, e.g. emotional arguments may resonate with some personalities while factual arguments resonate with others. We report a set of experiments testing at large scale how audience variables interact with argument style to affect the persuasiveness of an argument, an under-researched topic within natural language processing. We show that belief change is affected by personality factors, with conscientious, open and agreeable people being more convinced by emotional arguments.

pdf abs
Inference of Fine-Grained Event Causality from Blogs and Films
Zhichao Hu | Elahe Rahimtoroghi | Marilyn Walker
Proceedings of the Events and Stories in the News Workshop

Human understanding of narrative is mainly driven by reasoning about causal relations between events and thus recognizing them is a key capability for computational models of language understanding. Computational work in this area has approached this via two different routes: by focusing on acquiring a knowledge base of common causal relations between events, or by attempting to understand a particular story or macro-event, along with its storyline. In this position paper, we focus on knowledge acquisition approach and claim that newswire is a relatively poor source for learning fine-grained causal relations between everyday events. We describe experiments using an unsupervised method to learn causal relations between events in the narrative genres of first-person narratives and film scene descriptions. We show that our method learns fine-grained causal relations, judged by humans as likely to be causal over 80% of the time. We also demonstrate that the learned event pairs do not exist in publicly available event-pair datasets extracted from newswire.

pdf abs
Harvesting Creative Templates for Generating Stylistically Varied Restaurant Reviews
Shereen Oraby | Sheideh Homayon | Marilyn Walker
Proceedings of the Workshop on Stylistic Variation

Many of the creative and figurative elements that make language exciting are lost in translation in current natural language generation engines. In this paper, we explore a method to harvest templates from positive and negative reviews in the restaurant domain, with the goal of vastly expanding the types of stylistic variation available to the natural language generator. We learn hyperbolic adjective patterns that are representative of the strongly-valenced expressive language commonly used in either positive or negative reviews. We then identify and delexicalize entities, and use heuristics to extract generation templates from review sentences. We evaluate the learned templates against more traditional review templates, using subjective measures of convincingness, interestingness, and naturalness. Our results show that the learned templates score highly on these measures. Finally, we analyze the linguistic categories that characterize the learned positive and negative templates. We plan to use the learned templates to improve the conversational style of dialogue systems in the restaurant domain.

pdf abs
Stylistic Variation in Television Dialogue for Natural Language Generation
Grace Lin | Marilyn Walker
Proceedings of the Workshop on Stylistic Variation

Conversation is a critical component of storytelling, where key information is often revealed by what/how a character says it. We focus on the issue of character voice and build stylistic models with linguistic features related to natural language generation decisions. Using a dialogue corpus of the television series, The Big Bang Theory, we apply content analysis to extract relevant linguistic features to build character-based stylistic models, and we test the model-fit through an user perceptual experiment with Amazon’s Mechanical Turk. The results are encouraging in that human subjects tend to perceive the generated utterances as being more similar to the character they are modeled on, than to another random character.

pdf abs
Linguistic Reflexes of Well-Being and Happiness in Echo
Jiaqi Wu | Marilyn Walker | Pranav Anand | Steve Whittaker
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Different theories posit different sources for feelings of well-being and happiness. Appraisal theory grounds our emotional responses in our goals and desires and their fulfillment, or lack of fulfillment. Self-Determination theory posits that the basis for well-being rests on our assessments of our competence, autonomy and social connection. And surveys that measure happiness empirically note that people require their basic needs to be met for food and shelter, but beyond that tend to be happiest when socializing, eating or having sex. We analyze a corpus of private micro-blogs from a well-being application called Echo, where users label each written post about daily events with a happiness score between 1 and 9. Our goal is to ground the linguistic descriptions of events that users experience in theories of well-being and happiness, and then examine the extent to which different theoretical accounts can explain the variance in the happiness scores. We show that recurrent event types, such as obligation and incompetence, which affect people’s feelings of well-being are not captured in current lexical or semantic resources.

pdf abs
Are you serious?: Rhetorical Questions and Sarcasm in Social Media Dialog
Shereen Oraby | Vrindavan Harrison | Amita Misra | Ellen Riloff | Marilyn Walker
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Effective models of social dialog must understand a broad range of rhetorical and figurative devices. Rhetorical questions (RQs) are a type of figurative language whose aim is to achieve a pragmatic goal, such as structuring an argument, being persuasive, emphasizing a point, or being ironic. While there are computational models for other forms of figurative language, rhetorical questions have received little attention to date. We expand a small dataset from previous work, presenting a corpus of 10,270 RQs from debate forums and Twitter that represent different discourse functions. We show that we can clearly distinguish between RQs and sincere questions (0.76 F1). We then show that RQs can be used both sarcastically and non-sarcastically, observing that non-sarcastic (other) uses of RQs are frequently argumentative in forums, and persuasive in tweets. We present experiments to distinguish between these uses of RQs using SVM and LSTM models that represent linguistic features and post-level context, achieving results as high as 0.76 F1 for “sarcastic” and 0.77 F1 for “other” in forums, and 0.83 F1 for both “sarcastic” and “other” in tweets. We supplement our quantitative experiments with an in-depth characterization of the linguistic variation in RQs.

pdf abs
Inferring Narrative Causality between Event Pairs in Films
Zhichao Hu | Marilyn Walker
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

To understand narrative, humans draw inferences about the underlying relations between narrative events. Cognitive theories of narrative understanding define these inferences as four different types of causality, that include pairs of events A, B where A physically causes B (X drop, X break), to pairs of events where A causes emotional state B (Y saw X, Y felt fear). Previous work on learning narrative relations from text has either focused on “strict” physical causality, or has been vague about what relation is being learned. This paper learns pairs of causal events from a corpus of film scene descriptions which are action rich and tend to be told in chronological order. We show that event pairs induced using our methods are of high quality and are judged to have a stronger causal relation than event pairs from Rel-Grams.

pdf abs
Modelling Protagonist Goals and Desires in First-Person Narrative
Elahe Rahimtoroghi | Jiaqi Wu | Ruimin Wang | Pranav Anand | Marilyn Walker
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Many genres of natural language text are narratively structured, a testament to our predilection for organizing our experiences as narratives. There is broad consensus that understanding a narrative requires identifying and tracking the goals and desires of the characters and their narrative outcomes. However, to date, there has been limited work on computational models for this problem. We introduce a new dataset, DesireDB, which includes gold-standard labels for identifying statements of desire, textual evidence for desire fulfillment, and annotations for whether the stated desire is fulfilled given the evidence in the narrative context. We report experiments on tracking desire fulfillment using different methods, and show that LSTM Skip-Thought model achieves F-measure of 0.7 on our corpus.

pdf abs
Learning Lexico-Functional Patterns for First-Person Affect
Lena Reed | Jiaqi Wu | Shereen Oraby | Pranav Anand | Marilyn Walker
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Informal first-person narratives are a unique resource for computational models of everyday events and people’s affective reactions to them. People blogging about their day tend not to explicitly say I am happy. Instead they describe situations from which other humans can readily infer their affective reactions. However current sentiment dictionaries are missing much of the information needed to make similar inferences. We build on recent work that models affect in terms of lexical predicate functions and affect on the predicate’s arguments. We present a method to learn proxies for these functions from first-person narratives. We construct a novel fine-grained test set, and show that the patterns we learn improve our ability to predict first-person affective reactions to everyday events, from a Stanford sentiment baseline of .67F to .75F.

2016

pdf
Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue
Shereen Oraby | Vrindavan Harrison | Lena Reed | Ernesto Hernandez | Ellen Riloff | Marilyn Walker
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Measuring the Similarity of Sentential Arguments in Dialogue
Amita Misra | Brian Ecker | Marilyn Walker
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Learning Fine-Grained Knowledge about Contingent Relations between Everyday Events
Elahe Rahimtoroghi | Ernesto Hernandez | Marilyn Walker
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf abs
PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs
Stephanie Lukin | Kevin Bowden | Casey Barackman | Marilyn Walker
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a new corpus, PersonaBank, consisting of 108 personal stories from weblogs that have been annotated with their Story Intention Graphs, a deep representation of the content of a story. We describe the topics of the stories and the basis of the Story Intention Graph representation, as well as the process of annotating the stories to produce the Story Intention Graphs and the challenges of adapting the tool to this new personal narrative domain. We also discuss how the corpus can be used in applications that retell the story using different styles of tellings, co-tellings, or as a content planner.

pdf abs
Coordinating Communication in the Wild: The Artwalk Dialogue Corpus of Pedestrian Navigation and Mobile Referential Communication
Kris Liu | Jean Fox Tree | Marilyn Walker
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Artwalk Corpus is a collection of 48 mobile phone conversations between 24 pairs of friends and 24 pairs of strangers performing a novel, naturalistically-situated referential communication task. This task produced dialogues which, on average, are just under 40 minutes. The task requires the identification of public art while walking around and navigating pedestrian routes in the downtown area of Santa Cruz, California. The task involves a Director on the UCSC campus with access to maps providing verbal instructions to a Follower executing the task. The task provides a setting for real-world situated dialogic language and is designed to: (1) elicit entrainment and coordination of referring expressions between the dialogue participants, (2) examine the effect of friendship on dialogue strategies, and (3) examine how the need to complete the task while negotiating myriad, unanticipated events in the real world ― such as avoiding cars and other pedestrians ― affects linguistic coordination and other dialogue behaviors. Previous work on entrainment and coordinating communication has primarily focused on similar tasks in laboratory settings where there are no interruptions and no need to navigate from one point to another in a complex space. The corpus provides a general resource for studies on how coordinated task-oriented dialogue changes when we move outside the laboratory and into the world. It can also be used for studies of entrainment in dialogue, and the form and style of pedestrian instruction dialogues, as well as the effect of friendship on dialogic behaviors.

pdf abs
A Corpus of Gesture-Annotated Dialogues for Monologue-to-Dialogue Generation from Personal Narratives
Zhichao Hu | Michelle Dick | Chung-Ning Chang | Kevin Bowden | Michael Neff | Jean Fox Tree | Marilyn Walker
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Story-telling is a fundamental and prevalent aspect of human social behavior. In the wild, stories are told conversationally in social settings, often as a dialogue and with accompanying gestures and other nonverbal behavior. This paper presents a new corpus, the Story Dialogue with Gestures (SDG) corpus, consisting of 50 personal narratives regenerated as dialogues, complete with annotations of gesture placement and accompanying gesture forms. The corpus includes dialogues generated by human annotators, gesture annotations on the human generated dialogues, videos of story dialogues generated from this representation, video clips of each gesture used in the gesture annotations, and annotations of the original personal narratives with a deep representation of story called a Story Intention Graph. Our long term goal is the automatic generation of story co-tellings as animated dialogues from the Story Intention Graph. We expect this corpus to be a useful resource for researchers interested in natural language generation, intelligent virtual agents, generation of nonverbal behavior, and story and narrative representations.

pdf abs
A Verbal and Gestural Corpus of Story Retellings to an Expressive Embodied Virtual Character
Jackson Tolins | Kris Liu | Michael Neff | Marilyn Walker | Jean Fox Tree
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a corpus of 44 human-agent verbal and gestural story retellings designed to explore whether humans would gesturally entrain to an embodied intelligent virtual agent. We used a novel data collection method where an agent presented story components in installments, which the human would then retell to the agent. At the end of the installments, the human would then retell the embodied animated agent the story as a whole. This method was designed to allow us to observe whether changes in the agent’s gestural behavior would result in human gestural changes. The agent modified its gestures over the course of the story, by starting out the first installment with gestural behaviors designed to manifest extraversion, and slowly modifying gestures to express introversion over time, or the reverse. The corpus contains the verbal and gestural transcripts of the human story retellings. The gestures were coded for type, handedness, temporal structure, spatial extent, and the degree to which the participants’ gestures match those produced by the agent. The corpus illustrates the variation in expressive behaviors produced by users interacting with embodied virtual characters, and the degree to which their gestures were influenced by the agent’s dynamic changes in personality-based expressive style.

pdf abs
A Multimodal Motion-Captured Corpus of Matched and Mismatched Extravert-Introvert Conversational Pairs
Jackson Tolins | Kris Liu | Yingying Wang | Jean E. Fox Tree | Marilyn Walker | Michael Neff
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a new corpus, the Personality Dyads Corpus, consisting of multimodal data for three conversations between three personality-matched, two-person dyads (a total of 9 separate dialogues). Participants were selected from a larger sample to be 0.8 of a standard deviation above or below the mean on the Big-Five Personality extraversion scale, to produce an Extravert-Extravert dyad, an Introvert-Introvert dyad, and an Extravert-Introvert dyad. Each pair carried out conversations for three different tasks. The conversations were recorded using optical motion capture for the body and data gloves for the hands. Dyads’ speech was transcribed and the gestural and postural behavior was annotated with ANVIL. The released corpus includes personality profiles, ANVIL files containing speech transcriptions and the gestural annotations, and BVH files containing body and hand motion in 3D.

pdf abs
Internet Argument Corpus 2.0: An SQL schema for Dialogic Social Media and the Corpora to go with it
Rob Abbott | Brian Ecker | Pranav Anand | Marilyn Walker
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Large scale corpora have benefited many areas of research in natural language processing, but until recently, resources for dialogue have lagged behind. Now, with the emergence of large scale social media websites incorporating a threaded dialogue structure, content feedback, and self-annotation (such as stance labeling), there are valuable new corpora available to researchers. In previous work, we released the INTERNET ARGUMENT CORPUS, one of the first larger scale resources available for opinion sharing dialogue. We now release the INTERNET ARGUMENT CORPUS 2.0 (IAC 2.0) in the hope that others will find it as useful as we have. The IAC 2.0 provides more data than IAC 1.0 and organizes it using an extensible, repurposable SQL schema. The database structure in conjunction with the associated code facilitates querying from and combining multiple dialogically structured data sources. The IAC 2.0 schema provides support for forum posts, quotations, markup (bold, italic, etc), and various annotations, including Stanford CoreNLP annotations. We demonstrate the generalizablity of the schema by providing code to import the ConVote corpus.

pdf
Automatically Inferring Implicit Properties in Similes
Ashequl Qadir | Ellen Riloff | Marilyn A. Walker
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
NLDS-UCSC at SemEval-2016 Task 6: A Semi-Supervised Approach to Detecting Stance in Tweets
Amita Misra | Brian Ecker | Theodore Handleman | Nicolas Hahn | Marilyn Walker
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2015

pdf
Joint Models of Disagreement and Stance in Online Debate
Dhanya Sridhar | James Foulds | Bert Huang | Lise Getoor | Marilyn Walker
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf
Learning to Recognize Affective Polarity in Similes
Ashequl Qadir | Ellen Riloff | Marilyn Walker
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
Generating Sentence Planning Variations for Story Telling
Stephanie Lukin | Lena Reed | Marilyn Walker
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Argument Mining: Extracting Arguments from Online Dialogue
Reid Swanson | Brian Ecker | Marilyn Walker
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Using Summarization to Discover Argument Facets in Online Idealogical Dialog
Amita Misra | Pranav Anand | Jean E. Fox Tree | Marilyn Walker
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf abs
Getting Reliable Annotations for Sarcasm in Online Dialogues
Reid Swanson | Stephanie Lukin | Luke Eisenberg | Thomas Corcoran | Marilyn Walker
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The language used in online forums differs in many ways from that of traditional language resources such as news. One difference is the use and frequency of nonliteral, subjective dialogue acts such as sarcasm. Whether the aim is to develop a theory of sarcasm in dialogue, or engineer automatic methods for reliably detecting sarcasm, a major challenge is simply the difficulty of getting enough reliably labelled examples. In this paper we describe our work on methods for achieving highly reliable sarcasm annotations from untrained annotators on Mechanical Turk. We explore the use of a number of common statistical reliability measures, such as Kappa, Karger’s, Majority Class, and EM. We show that more sophisticated measures do not appear to yield better results for our data than simple measures such as assuming that the correct label is the one that a majority of Turkers apply.

pdf
Collective Stance Classification of Posts in Online Debate Forums
Dhanya Sridhar | Lise Getoor | Marilyn Walker
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

pdf
Identifying Narrative Clause Types in Personal Stories
Reid Swanson | Elahe Rahimtoroghi | Thomas Corcoran | Marilyn Walker
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2013

pdf
Unsupervised Induction of Contingent Event Pairs from Film Scenes
Zhichao Hu | Elahe Rahimtoroghi | Larissa Munishkina | Reid Swanson | Marilyn A. Walker
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf
Really? Well. Apparently Bootstrapping Improves the Performance of Sarcasm and Nastiness Classifiers for Online Dialogue
Stephanie Lukin | Marilyn Walker
Proceedings of the Workshop on Language Analysis in Social Media

pdf
Topic Independent Identification of Agreement and Disagreement in Social Media Dialogue
Amita Misra | Marilyn Walker
Proceedings of the SIGDIAL 2013 Conference

2012

pdf
Stance Classification using Dialogic Properties of Persuasion
Marilyn Walker | Pranav Anand | Rob Abbott | Ricky Grant
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf abs
A Corpus for Research on Deliberation and Debate
Marilyn Walker | Jean Fox Tree | Pranav Anand | Rob Abbott | Joseph King
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Deliberative, argumentative discourse is an important component of opinion formation, belief revision, and knowledge discovery; it is a cornerstone of modern civil society. Argumentation is productively studied in branches ranging from theoretical artificial intelligence to political rhetoric, but empirical analysis has suffered from a lack of freely available, unscripted argumentative dialogs. This paper presents the Internet Argument Corpus (IAC), a set of 390,704 posts in 11,800 discussions extracted from the online debate site 4forums.com. A 2866 thread/130,206 post extract of the corpus has been manually sided for topic of discussion, and subsets of this topic-labeled extract have been annotated for several dialogic and argumentative markers: degrees of agreement with a previous post, cordiality, audience-direction, combativeness, assertiveness, emotionality of argumentation, and sarcasm. As an application of this resource, the paper closes with a discussion of the relationship between discourse marker pragmatics, agreement, emotionality, and sarcasm in the IAC corpus.

pdf abs
An Annotated Corpus of Film Dialogue for Learning and Characterizing Character Style
Marilyn Walker | Grace Lin | Jennifer Sawyer
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Interactive story systems often involve dialogue with virtual dramatic characters. However, to date most character dialogue is written by hand. One way to ease the authoring process is to (semi-)automatically generate dialogue based on film characters. We extract features from dialogue of film characters in leading roles. Then we use these character-based features to drive our language generator to produce interesting utterances. This paper describes a corpus of film dialogue that we have collected from the IMSDb archive and annotated for linguistic structures and character archetypes. We extract different sets of features using external sources such as LIWC and SentiWordNet as well as using our own written scripts. The automation of feature extraction also eases the process of acquiring additional film scripts. We briefly show how film characters can be represented by models learned from the corpus, how the models can be distinguished based on different categories such as gender and film genre, and how they can be applied to a language generator to generate utterances that can be perceived as being similar to the intended character model.

Spoken dialogue systems are common interfaces to backend data in information retrieval domains. As more data is made available on the Web and IE technology matures, dialogue systems, whether they be speech- or text-based, will be more in demand to provide user-friendly access to this data. However, dialogue systems must become both easier to configure, as well as more informative than the traditional form-based systems that are currently available. We present techniques in this paper to address the issue of automating both content selection for use in summary responses and in system initiative queries.

pdf abs
Simulating Cub Reporter Dialogues: The collection of naturalistic human-human dialogues for information access to text archives
Emma Barker | Ryuichiro Higashinaka | François Mairesse | Robert Gaizauskas | Marilyn Walker | Jonathan Foster
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes a dialogue data collection experiment and resulting corpus for dialogues between a senior mobile journalist and a junior cub reporter back at the office. The purpose of the dialogue is for the mobile journalist to collect background information in preparation for an interview or on-the-site coverage of a breaking story. The cub reporter has access to text archives that contain such background information. A unique aspect of these dialogues is that they capture information-seeking behavior for an open-ended task against a large unstructured data source. Initial analyses of the corpus show that the experimental design leads to real-time, mixedinitiative, highly interactive dialogues with many interesting properties.

pdf
Automatic Recognition of Personality in Conversation
François Mairesse | Marilyn Walker
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf
Learning to Generate Naturalistic Utterances Using Reviews in Spoken Dialogue Systems
Ryuichiro Higashinaka | Rashmi Prasad | Marilyn A. Walker
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics