Ramesh Manuvinakurike

Also published as: Ramesh Manuvirakurike

2023

With the power of large pretrained language models, various research works have integrated knowledge into dialogue systems. The traditional techniques treat knowledge as part of the input sequence for the dialogue system, prepending a set of knowledge statements in front of dialogue history. However, such a mechanism forces knowledge sets to be concatenated in an ordered manner, making models implicitly pay imbalanced attention to the sets during training. In this paper, we first investigate how the order of the knowledge set can influence autoregressive dialogue systems’ responses. We conduct experiments on two commonly used dialogue datasets with two types of transformer-based models and find that models view the input knowledge unequally. To this end, we propose a simple and novel technique to alleviate the order effect by modifying the position embeddings of knowledge input in these models. With the proposed position embedding method, the experimental results show that each knowledge statement is uniformly considered to generate responses.

2022

Conversational assistants are ubiquitous among the general population, however, these systems have not had an impact on people with disabilities, or speech and language disorders, for whom basic day-to-day communication and social interaction is a huge struggle. Language model technology can play a huge role in empowering these users and help them interact with others with less effort via interaction support. To enable this population, we build a system that can represent them in a social conversation and generate responses that can be controlled by the users using cues/keywords. We build models that can speed up this communication by suggesting relevant cues in the dialog response context. We also introduce a keyword-loss to lexically constrain the model response output. We present automatic and human evaluation of our cue/keyword predictor and the controllable dialog system to show that our models perform significantly better than models without control. Our evaluation and user study shows that keyword-control on end-to-end response generation models is powerful and can enable and empower users with degenerative disorders to carry out their day-to-day communication.

Intelligent conversational assistants have become an integral part of our lives for performing simple tasks. However, such agents, for example, Google bots, Alexa and others are yet to have any social impact on minority population, for example, for people with neurological disorders and people with speech, language and social communication disorders, sometimes with locked-in states where speaking or typing is a challenge. Language model technologies can be very powerful tools in enabling these users to carry out daily communication and social interactions. In this work, we present a system that users with varied levels of disabilties can use to interact with the world, supported by eye-tracking, mouse controls and an intelligent agent Cue-bot, that can represent the user in a conversation. The agent provides relevant controllable ‘cues’ to generate desirable responses quickly for an ongoing dialog context. In the context of usage of such systems for people with degenerative disorders, we present automatic and human evaluation of our cue/keyword predictor and the controllable dialog system and show that our models perform significantly better than models without control and can also reduce user effort (fewer keystrokes) and speed up communication (typing time) significantly.

pdf abs
Strategy-level Entrainment of Dialogue System Users in a Creative Visual Reference Resolution Task
Deepthi Karkada | Ramesh Manuvinakurike | Maike Paetzel-Prüsmann | Kallirroi Georgila
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this work, we study entrainment of users playing a creative reference resolution game with an autonomous dialogue system. The language understanding module in our dialogue system leverages annotated human-wizard conversational data, openly available knowledge graphs, and crowd-augmented data. Unlike previous entrainment work, our dialogue system does not attempt to make the human conversation partner adopt lexical items in their dialogue, but rather to adapt their descriptive strategy to one that is simpler to parse for our natural language understanding unit. By deploying this dialogue system through a crowd-sourced study, we show that users indeed entrain on a “strategy-level” without the change of strategy impinging on their creativity. Our work thus presents a promising future research direction for developing dialogue management systems that can strategically influence people’s descriptive strategy to ease the system’s language understanding in creative tasks.

The CODI-CRAC 2022 Shared Task on Anaphora Resolution in Dialogues is the second edition of an initiative focused on detecting different types of anaphoric relations in conversations of different kinds. Using five conversational datasets, four of which have been newly annotated with a wide range of anaphoric relations: identity, bridging references and discourse deixis, we defined multiple tasks focusing individually on these key relations. The second edition of the shared task maintained the focus on these relations and used the same datasets as in 2021, but new test data were annotated, the 2021 data were checked, and new subtasks were added. In this paper, we discuss the annotation schemes, the datasets, the evaluation scripts used to assess the system performance on these tasks, and provide a brief summary of the participating systems and the results obtained across 230 runs from three teams, with most submissions achieving significantly better results than our baseline methods.

2021

pdf abs
Incremental temporal summarization in multi-party meetings
Ramesh Manuvinakurike | Saurav Sahay | Wenda Chen | Lama Nachman
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

In this work, we develop a dataset for incremental temporal summarization in a multiparty dialogue. We use crowd-sourcing paradigm with a model-in-loop approach for collecting the summaries and compare the data with the expert summaries. We leverage the question generation paradigm to automatically generate questions from the dialogue, which can be used to validate the user participation and potentially also draw attention of the user towards the contents then need to summarize. We then develop several models for abstractive summary generation in the Incremental temporal scenario. We perform a detailed analysis of the results and show that including the past context into the summary generation yields better summaries.

pdf bib
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

pdf bib abs
The CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue
Sopan Khosla | Juntao Yu | Ramesh Manuvinakurike | Vincent Ng | Massimo Poesio | Michael Strube | Carolyn Rosé
Proceedings of the CODI-CRAC 2021 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

In this paper, we provide an overview of the CODI-CRAC 2021 Shared-Task: Anaphora Resolution in Dialogue. The shared task focuses on detecting anaphoric relations in different genres of conversations. Using five conversational datasets, four of which have been newly annotated with a wide range of anaphoric relations: identity, bridging references and discourse deixis, we defined multiple subtasks focusing individually on these key relations. We discuss the evaluation scripts used to assess the system performance on these subtasks, and provide a brief summary of the participating systems and the results obtained across ?? runs from 5 teams, with most submissions achieving significantly better results than our baseline methods.

pdf bib abs
Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation
Jakob Nyberg | Maike Paetzel | Ramesh Manuvinakurike
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

Human ratings are one of the most prevalent methods to evaluate the performance of NLP (natural language processing) algorithms. Similarly, it is common to measure the quality of sentences generated by a natural language generation model using human raters. In this paper we argue for exploring the use of subjective evaluations within the process of training language generation models in a multi-task learning setting. As a case study, we use a crowd-authored dialogue corpus to fine-tune six different language generation models. Two of these models incorporate multi-task learning and use subjective ratings of lines as part of an explicit learning goal. A human evaluation of the generated dialogue lines reveals that utterances generated by the multi-tasking models were subjectively rated as the most typical, most moving the conversation forward, and least offensive. Based on these promising first results, we discuss future research directions for incorporating subjective human evaluations into language model training and to hence keep the human user in the loop during the development process.

pdf abs
Context or No Context? A preliminary exploration of human-in-the-loop approach for Incremental Temporal Summarization in meetings
Nicole Beckage | Shachi H Kumar | Saurav Sahay | Ramesh Manuvinakurike
Proceedings of the Third Workshop on New Frontiers in Summarization

Incremental meeting temporal summarization, summarizing relevant information of partial multi-party meeting dialogue, is emerging as the next challenge in summarization research. Here we examine the extent to which human abstractive summaries of the preceding increments (context) can be combined with extractive meeting dialogue to generate abstractive summaries. We find that previous context improves ROUGE scores. Our findings further suggest that contexts begin to outweigh the dialogue. Using keyphrase extraction and semantic role labeling (SRL), we find that SRL captures relevant information without overwhelming the the model architecture. By compressing the previous contexts by ~70%, we achieve better ROUGE scores over our baseline models. Collectively, these results suggest that context matters, as does the way in which context is presented to the model.

2020

pdf abs
RDG-Map: A Multimodal Corpus of Pedagogical Human-Agent Spoken Interactions.
Maike Paetzel | Deepthi Karkada | Ramesh Manuvinakurike
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents a multimodal corpus of 209 spoken game dialogues between a human and a remote-controlled artificial agent. The interactions involve people collaborating with the agent to identify countries on the world map as quickly as possible, which allows studying rapid and spontaneous dialogue with complex anaphoras, disfluent utterances and incorrect descriptions. The corpus consists of two parts: 8 hours of game interactions have been collected with a virtual unembodied agent online and 26.8 hours have been recorded with a physically embodied robot in a research lab. In addition to spoken audio recordings available for both parts, camera recordings and skeleton-, facial expression- and eye-gaze tracking data have been collected for the lab-based part of the corpus. In this paper, we introduce the pedagogical reference resolution game (RDG-Map) and the characteristics of the corpus collected. We also present an annotation scheme we developed in order to study the dialogue strategies utilized by the players. Based on a subset of 330 minutes of interactions annotated so far, we discuss initial insights into these strategies as well as the potential of the corpus for future research.

pdf abs
Nontrivial Lexical Convergence in a Geography-Themed Game
Amanda Bergqvist | Ramesh Manuvinakurike | Deepthi Karkada | Maike Paetzel
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

The present study aims to examine the prevalent notion that people entrain to the vocabulary of a dialogue system. Although previous research shows that people will replace their choice of words with simple substitutes, studies using more challenging substitutions are sparse. In this paper, we investigate whether people adapt their speech to the vocabulary of a dialogue system when the system’s suggested words are not direct synonyms. 32 participants played a geography-themed game with a remote-controlled agent and were primed by referencing strategies (rather than individual terms) introduced in follow-up questions. Our results suggest that context-appropriate substitutes support convergence and that the convergence has a lasting effect within a dialogue session if the system’s wording is more consistent with the norms of the domain than the original wording of the speaker.

2018

pdf bib
DialEdit: Annotations for Spoken Conversational Image Editing
Ramesh Manuvirakurike | Jacqueline Brixey | Trung Bui | Walter Chang | Ron Artstein | Kallirroi Georgila
Proceedings of the 14th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

pdf
A Dialogue Annotation Scheme for Weight Management Chat using the Trans-Theoretical Model of Health Behavior Change
Ramesh Manuvirakurike | Sumanth Bharawadj | Kallirroi Georgila
Proceedings of the 14th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

pdf
Towards Understanding End-of-trip Instructions in a Taxi Ride Scenario
Deepthi Karkada | Ramesh Manuvirakurike | Kallirroi Georgila
Proceedings of the 14th Joint ACL-ISO Workshop on Interoperable Semantic Annotation

pdf abs
Conversational Image Editing: Incremental Intent Identification in a New Dialogue Task
Ramesh Manuvinakurike | Trung Bui | Walter Chang | Kallirroi Georgila
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

We present “conversational image editing”, a novel real-world application domain combining dialogue, visual information, and the use of computer vision. We discuss the importance of dialogue incrementality in this task, and build various models for incremental intent identification based on deep learning and traditional classification algorithms. We show how our model based on convolutional neural networks outperforms models based on random forests, long short term memory networks, and conditional random fields. By training embeddings based on image-related dialogue corpora, we outperform pre-trained out-of-the-box embeddings, for intention identification tasks. Our experiments also provide evidence that incremental intent processing may be more efficient for the user and could save time in accomplishing tasks.

pdf
Edit me: A Corpus and a Framework for Understanding Natural Language Image Editing
Ramesh Manuvinakurike | Jacqueline Brixey | Trung Bui | Walter Chang | Doo Soon Kim | Ron Artstein | Kallirroi Georgila
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf abs
Using Reinforcement Learning to Model Incrementality in a Fast-Paced Dialogue Game
Ramesh Manuvinakurike | David DeVault | Kallirroi Georgila
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

We apply Reinforcement Learning (RL) to the problem of incremental dialogue policy learning in the context of a fast-paced dialogue game. We compare the policy learned by RL with a high-performance baseline policy which has been shown to perform very efficiently (nearly as well as humans) in this dialogue game. The RL policy outperforms the baseline policy in offline simulations (based on real user data). We provide a detailed comparison of the RL policy and the baseline policy, including information about how much effort and time it took to develop each one of them. We also highlight the cases where the RL policy performs better, and show that understanding the RL policy can provide valuable insights which can inform the creation of an even better rule-based policy.

2016

PentoRef is a corpus of task-oriented dialogues collected in systematically manipulated settings. The corpus is multilingual, with English and German sections, and overall comprises more than 20000 utterances. The dialogues are fully transcribed and annotated with referring expressions mapped to objects in corresponding visual scenes, which makes the corpus a rich resource for research on spoken referring expressions in generation and resolution. The corpus includes several sub-corpora that correspond to different dialogue situations where parameters related to interactivity, visual access, and verbal channel have been manipulated in systematic ways. The corpus thus lends itself to very targeted studies of reference in spontaneous dialogue.

pdf
Real-Time Understanding of Complex Discriminative Scene Descriptions
Ramesh Manuvinakurike | Casey Kennington | David DeVault | David Schlangen
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Toward incremental dialogue act segmentation in fast-paced interactive dialogue systems
Ramesh Manuvinakurike | Maike Paetzel | Cheng Qu | David Schlangen | David DeVault
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue