David Traum

Also published as: David R. Traum

2024

We introduce the Situated Corpus Of Understanding Transactions (SCOUT), a multi-modal collection of human-robot dialogue in the task domain of collaborative exploration. The corpus was constructed from multiple Wizard-of-Oz experiments where human participants gave verbal instructions to a remotely-located robot to move and gather information about its surroundings. SCOUT contains 89,056 utterances and 310,095 words from 278 dialogues averaging 320 utterances per dialogue. The dialogues are aligned with the multi-modal data streams available during the experiments: 5,785 images and 30 maps. The corpus has been annotated with Abstract Meaning Representation and Dialogue-AMR to identify the speaker’s intent and meaning within an utterance, and with Transactional Units and Relations to track relationships between utterances to reveal patterns of the Dialogue Structure. We describe how the corpus and its annotations have been used to develop autonomous human-robot systems and enable research in open questions of how humans speak to robots. We release this corpus to accelerate progress in autonomous, situated, human-robot dialogue, especially in the context of navigation tasks where details about the environment need to be discovered.

2022

The ultimate goal of dialog research is to develop systems that can be effectively used in interactive settings by real users. To this end, we introduced the Interactive Evaluation of Dialog Track at the 9th Dialog System Technology Challenge. This track consisted of two sub-tasks. The first sub-task involved building knowledge-grounded response generation models. The second sub-task aimed to extend dialog models beyond static datasets by assessing them in an interactive setting with real users. Our track challenges participants to develop strong response generation models and explore strategies that extend them to back-and-forth interactions with real users. The progression from static corpora to interactive evaluation introduces unique challenges and facilitates a more thorough assessment of open-domain dialog systems. This paper provides an overview of the track, including the methodology and results. Furthermore, it provides insights into how to best evaluate open-domain dialog models.

pdf abs
Comparing Approaches to Language Understanding for Human-Robot Dialogue: An Error Taxonomy and Analysis
Ada Tur | David Traum
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we compare two different approaches to language understanding for a human-robot interaction domain in which a human commander gives navigation instructions to a robot. We contrast a relevance-based classifier with a GPT-2 model, using about 2000 input-output examples as training data. With this level of training data, the relevance-based model outperforms the GPT-2 based model 79% to 8%. We also present a taxonomy of types of errors made by each model, indicating that they have somewhat different strengths and weaknesses, so we also examine the potential for a combined model.

pdf abs
Evaluation of Off-the-shelf Speech Recognizers on Different Accents in a Dialogue Domain
Divya Tadimeti | Kallirroi Georgila | David Traum
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We evaluate several publicly available off-the-shelf (commercial and research) automatic speech recognition (ASR) systems on dialogue agent-directed English speech from speakers with General American vs. non-American accents. Our results show that the performance of the ASR systems for non-American accents is considerably worse than for General American accents. Depending on the recognizer, the absolute difference in performance between General American accents and all non-American accents combined can vary approximately from 2% to 12%, with relative differences varying approximately between 16% and 49%. This drop in performance becomes even larger when we consider specific categories of non-American accents indicating a need for more diligent collection of and training on non-native English speaker data in order to narrow this performance gap. There are performance differences across ASR systems, and while the same general pattern holds, with more errors for non-American accents, there are some accents for which the best recognizer is different than in the overall case. We expect these results to be useful for dialogue system designers in developing more robust inclusive dialogue systems, and for ASR providers in taking into account performance requirements for different accents.

2021

pdf abs
Builder, we have done it: Evaluating & Extending Dialogue-AMR NLU Pipeline for Two Collaborative Domains
Claire Bonial | Mitchell Abrams | David Traum | Clare Voss
Proceedings of the 14th International Conference on Computational Semantics (IWCS)

We adopt, evaluate, and improve upon a two-step natural language understanding (NLU) pipeline that incrementally tames the variation of unconstrained natural language input and maps to executable robot behaviors. The pipeline first leverages Abstract Meaning Representation (AMR) parsing to capture the propositional content of the utterance, and second converts this into “Dialogue-AMR,” which augments standard AMR with information on tense, aspect, and speech acts. Several alternative approaches and training datasets are evaluated for both steps and corresponding components of the pipeline, some of which outperform the original. We extend the Dialogue-AMR annotation schema to cover a different collaborative instruction domain and evaluate on both domains. With very little training data, we achieve promising performance in the new domain, demonstrating the scalability of this approach.

2020

This paper describes a schema that enriches Abstract Meaning Representation (AMR) in order to provide a semantic representation for facilitating Natural Language Understanding (NLU) in dialogue systems. AMR offers a valuable level of abstraction of the propositional content of an utterance; however, it does not capture the illocutionary force or speaker’s intended contribution in the broader dialogue context (e.g., make a request or ask a question), nor does it capture tense or aspect. We explore dialogue in the domain of human-robot interaction, where a conversational robot is engaged in search and navigation tasks with a human partner. To address the limitations of standard AMR, we develop an inventory of speech acts suitable for our domain, and present “Dialogue-AMR”, an enhanced AMR that represents not only the content of an utterance, but the illocutionary force behind it, as well as tense and aspect. To showcase the coverage of the schema, we use both manual and automatic methods to construct the “DialAMR” corpus—a corpus of human-robot dialogue annotated with standard AMR and our enriched Dialogue-AMR schema. Our automated methods can be used to incorporate AMR into a larger NLU pipeline supporting human-robot dialogue.

pdf abs
Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers
Kallirroi Georgila | Carla Gordon | Volodymyr Yanov | David Traum
Proceedings of the Twelfth Language Resources and Evaluation Conference

We collected a corpus of dialogues in a Wizard of Oz (WOz) setting in the Internet of Things (IoT) domain. We asked users participating in these dialogues to rate the system on a number of aspects, namely, intelligence, naturalness, personality, friendliness, their enjoyment, overall quality, and whether they would recommend the system to others. Then we asked dialogue observers, i.e., Amazon Mechanical Turkers (MTurkers), to rate these dialogues on the same aspects. We also generated simulated dialogues between dialogue policies and simulated users and asked MTurkers to rate them again on the same aspects. Using linear regression, we developed dialogue evaluation functions based on features from the simulated dialogues and the MTurkers’ ratings, the WOz dialogues and the MTurkers’ ratings, and the WOz dialogues and the WOz participants’ ratings. We applied all these dialogue evaluation functions to a held-out portion of our WOz dialogues, and we report results on the predictive power of these different types of dialogue evaluation functions. Our results suggest that for three conversational aspects (intelligence, naturalness, overall quality) just training evaluation functions on simulated data could be sufficient.

pdf abs
Which Model Should We Use for a Real-World Conversational Dialogue System? a Cross-Language Relevance Model or a Deep Neural Net?
Seyed Hossein Alavi | Anton Leuski | David Traum
Proceedings of the Twelfth Language Resources and Evaluation Conference

We compare two models for corpus-based selection of dialogue responses: one based on cross-language relevance with a cross-language LSTM model. Each model is tested on multiple corpora, collected from two different types of dialogue source material. Results show that while the LSTM model performs adequately on a very large corpus (millions of utterances), its performance is dominated by the cross-language relevance model for a more moderate-sized corpus (ten thousands of utterances).

pdf abs
Exploring a Choctaw Language Corpus with Word Vectors and Minimum Distance Length
Jacqueline Brixey | David Sides | Timothy Vizthum | David Traum | Khalil Iskarous
Proceedings of the Twelfth Language Resources and Evaluation Conference

This work introduces additions to the corpus ChoCo, a multimodal corpus for the American indigenous language Choctaw. Using texts from the corpus, we develop new computational resources by using two off-the-shelf tools: word2vec and Linguistica. Our work illustrates how these tools can be successfully implemented with a small corpus.

pdf abs
Evaluation of Off-the-shelf Speech Recognizers Across Diverse Dialogue Domains
Kallirroi Georgila | Anton Leuski | Volodymyr Yanov | David Traum
Proceedings of the Twelfth Language Resources and Evaluation Conference

We evaluate several publicly available off-the-shelf (commercial and research) automatic speech recognition (ASR) systems across diverse dialogue domains (in US-English). Our evaluation is aimed at non-experts with limited experience in speech recognition. Our goal is not only to compare a variety of ASR systems on several diverse data sets but also to measure how much ASR technology has advanced since our previous large-scale evaluations on the same data sets. Our results show that the performance of each speech recognizer can vary significantly depending on the domain. Furthermore, despite major recent progress in ASR technology, current state-of-the-art speech recognizers perform poorly in domains that require special vocabulary and language models, and under noisy conditions. We expect that our evaluation will prove useful to ASR consumers and dialogue system designers.

2019

pdf abs
A Blissymbolics Translation System
Usman Sohail | David Traum
Proceedings of the Eighth Workshop on Speech and Language Processing for Assistive Technologies

Blissymbolics (Bliss) is a pictographic writing system that is used by people with communication disorders. Bliss attempts to create a writing system that makes words easier to distinguish by using pictographic symbols that encapsulate meaning rather than sound, as the English alphabet does for example. Users of Bliss rely on human interpreters to use Bliss. We created a translation system from Bliss to natural English with the hopes of decreasing the reliance on human interpreters by the Bliss community. We first discuss the basic rules of Blissymbolics. Then we point out some of the challenges associated with developing computer assisted tools for Blissymbolics. Next we talk about our ongoing work in developing a translation system, including current limitations, and future work. We conclude with a set of examples showing the current capabilities of our translation system.

We detail refinements made to Abstract Meaning Representation (AMR) that make the representation more suitable for supporting a situated dialogue system, where a human remotely controls a robot for purposes of search and rescue and reconnaissance. We propose 36 augmented AMRs that capture speech acts, tense and aspect, and spatial information. This linguistic information is vital for representing important distinctions, for example whether the robot has moved, is moving, or will move. We evaluate two existing AMR parsers for their performance on dialogue data. We also outline a model for graph-to-graph conversion, in which output from AMR parsers is converted into our refined AMRs. The design scheme presented here, though task-specific, is extendable for broad coverage of speech acts using AMR in future task-independent work.

pdf bib
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Anna Korhonen | David Traum | Lluís Màrquez
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

2018

ScoutBot is a dialogue interface to physical and simulated robots that supports collaborative exploration of environments. The demonstration will allow users to issue unconstrained spoken language commands to ScoutBot. ScoutBot will prompt for clarification if the user’s instruction needs additional input. It is trained on human-robot dialogue collected from Wizard-of-Oz experiments, where robot responses were initiated by a human wizard in previous interactions. The demonstration will show a simulated ground robot (Clearpath Jackal) in a simulated environment supported by ROS (Robot Operating System).

This paper identifies stylistic differences in instruction-giving observed in a corpus of human-robot dialogue. Differences in verbosity and structure (i.e., single-intent vs. multi-intent instructions) arose naturally without restrictions or prior guidance on how users should speak with the robot. Different styles were found to produce different rates of miscommunication, and correlations were found between style differences and individual user variation, trust, and interaction experience with the robot. Understanding potential consequences and factors that influence style can inform design of dialogue systems that are robust to natural variation from human users.

pdf
Identification of Personal Information Shared in Chat-Oriented Dialogue
Sarah Fillwock | David Traum
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

Robot-directed communication is variable, and may change based on human perception of robot capabilities. To collect training data for a dialogue system and to investigate possible communication changes over time, we developed a Wizard-of-Oz study that (a) simulates a robot’s limited understanding, and (b) collects dialogues where human participants build a progressively better mental model of the robot’s understanding. With ten participants, we collected ten hours of human-robot dialogue. We analyzed the structure of instructions that participants gave to a remote robot before it responded. Our findings show a general initial preference for including metric information (e.g., move forward 3 feet) over landmarks (e.g., move to the desk) in motion commands, but this decreased over time, suggesting changes in perception.

DialPort collects user data for connected spoken dialog systems. At present six systems are linked to a central portal that directs the user to the applicable system and suggests systems that the user may be interested in. User data has started to flow into the system.

2016

pdf
New Dimensions in Testimony Demonstration
Ron Artstein | Alesia Gainer | Kallirroi Georgila | Anton Leuski | Ari Shapiro | David Traum
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

pdf abs
Towards a Multi-dimensional Taxonomy of Stories in Dialogue
Kathryn J. Collins | David Traum
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we present a taxonomy of stories told in dialogue. We based our scheme on prior work analyzing narrative structure and method of telling, relation to storyteller identity, as well as some categories particular to dialogue, such as how the story gets introduced. Our taxonomy currently has 5 major dimensions, with most having sub-dimensions - each dimension has an associated set of dimension-specific labels. We adapted an annotation tool for this taxonomy and have annotated portions of two different dialogue corpora, Switchboard and the Distress Analysis Interview Corpus. We present examples of some of the tags and concepts with stories from Switchboard, and some initial statistics of frequencies of the tags.

pdf abs
Towards Automatic Identification of Effective Clues for Team Word-Guessing Games
Eli Pincus | David Traum
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Team word-guessing games where one player, the clue-giver, gives clues attempting to elicit a target-word from another player, the receiver, are a popular form of entertainment and also used for educational purposes. Creating an engaging computational agent capable of emulating a talented human clue-giver in a timed word-guessing game depends on the ability to provide effective clues (clues able to elicit a correct guess from a human receiver). There are many available web resources and databases that can be mined for the raw material for clues for target-words; however, a large number of those clues are unlikely to be able to elicit a correct guess from a human guesser. In this paper, we propose a method for automatically filtering a clue corpus for effective clues for an arbitrary target-word from a larger set of potential clues, using machine learning on a set of features of the clues, including point-wise mutual information between a clue’s constituent words and a clue’s target-word. The results of the experiments significantly improve the average clue quality over previous approaches, and bring quality rates in-line with measures of human clue quality derived from a corpus of human-human interactions. The paper also introduces the data used to develop this method; audio recordings of people making guesses after having heard the clues being spoken by a synthesized voice.

pdf
Analyzing the Effect of Entrainment on Dialogue Acts
Masahiro Mizukami | Koichiro Yoshino | Graham Neubig | David Traum | Satoshi Nakamura
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2015

pdf
Reinforcement Learning in Multi-Party Trading Dialog
Takuya Hiraoka | Kallirroi Georgila | Elnaz Nouri | David Traum | Satoshi Nakamura
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Which Synthetic Voice Should I Choose for an Evocative Task?
Eli Pincus | Kallirroi Georgila | David Traum
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
Evaluating Spoken Dialogue Processing for Time-Offset Interaction
David Traum | Kallirroi Georgila | Ron Artstein | Anton Leuski
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
The Real Challenge 2014: Progress and Prospects
Maxine Eskenazi | Alan W Black | Sungjin Lee | David Traum
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

The Distress Analysis Interview Corpus (DAIC) contains clinical interviews designed to support the diagnosis of psychological distress conditions such as anxiety, depression, and post traumatic stress disorder. The interviews are conducted by humans, human controlled agents and autonomous agents, and the participants include both distressed and non-distressed individuals. Data collected include audio and video recordings and extensive questionnaire responses; parts of the corpus have been transcribed and annotated for a variety of verbal and non-verbal features. The corpus has been used to support the creation of an automated interviewer agent, and for research on the automatic identification of psychological distress.

pdf
Single-Agent vs. Multi-Agent Techniques for Concurrent Reinforcement Learning of Negotiation Dialogue Policies
Kallirroi Georgila | Claire Nelson | David Traum
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Initiative Taking in Negotiation
Elnaz Nouri | David Traum
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf
SAWDUST: a Semi-Automated Wizard Dialogue Utterance Selection Tool for domain-independent large-domain dialogue
Sudeep Gandhe | David Traum
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf
A Demonstration of Dialogue Processing in SimSensei Kiosk
Fabrizio Morbini | David DeVault | Kallirroi Georgila | Ron Artstein | David Traum | Louis-Philippe Morency
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2013

pdf
A method for the approximation of incremental understanding of explicit utterance meaning using predictive models in finite domains
David DeVault | David Traum
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Surface Text based Dialogue Models for Virtual Humans
Sudeep Gandhe | David Traum
Proceedings of the SIGDIAL 2013 Conference

2012

pdf
Reinforcement Learning of Question-Answering Dialogue Policies for Virtual Museum Guides
Teruhisa Misu | Kallirroi Georgila | Anton Leuski | David Traum
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
A Demonstration of Incremental Speech Understanding and Confidence Estimation in a Virtual Human Dialogue System
David DeVault | David Traum
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)
Maxine Eskenazi | Alan Black | David Traum
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)

This paper summarizes the latest, final version of ISO standard 24617-2 ``Semantic annotation framework, Part 2: Dialogue acts"""". Compared to the preliminary version ISO DIS 24617-2:2010, described in Bunt et al. (2010), the final version additionally includes concepts for annotating rhetorical relations between dialogue units, defines a full-blown compositional semantics for the Dialogue Act Markup Language DiAML (resulting, as a side-effect, in a different treatment of functional dependence relations among dialogue acts and feedback dependence relations); and specifies an optimally transparent XML-based reference format for the representation of DiAML annotations, based on the systematic application of the notion of `ideal concrete syntax'. We describe these differences and briefly discuss the design and implementation of an incremental method for dialogue act recognition, which proves the usability of the ISO standard for automatic dialogue annotation.

pdf abs
Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems
Kallirroi Georgila | Alan Black | Kenji Sagae | David Traum
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The current practice in virtual human dialogue systems is to use professional human recordings or limited-domain speech synthesis. Both approaches lead to good performance but at a high cost. To determine the best trade-off between performance and cost, we perform a systematic evaluation of human and synthesized voices with regard to naturalness, conversational aspect, and likability. We vary the type (in-domain vs. out-of-domain), length, and content of utterances, and take into account the age and native language of raters as well as their familiarity with speech synthesis. We present detailed results from two studies, a pilot one and one run on Amazon's Mechanical Turk. Our results suggest that a professional human voice can supersede both an amateur human voice and synthesized voices. Also, a high-quality general-purpose voice or a good limited-domain voice can perform better than amateur human recordings. We do not find any significant differences between the performance of a high-quality general-purpose voice and a limited-domain voice, both trained with speech recorded by actors. As expected, the high-quality general-purpose voice is rated higher than the limited-domain voice for out-of-domain sentences and lower for in-domain sentences. There is also a trend for long or negative-content utterances to receive lower ratings.

The Twins corpus is a collection of utterances spoken in interactions with two virtual characters who serve as guides at the Museum of Science in Boston. The corpus contains about 200,000 spoken utterances from museum visitors (primarily children) as well as from trained handlers who work at the museum. In addition to speech recordings, the corpus contains the outputs of speech recognition performed at the time of utterance as well as the system interpretation of the utterances. Parts of the corpus have been manually transcribed and annotated for question interpretation. The corpus has been used for improving performance of the museum characters and for a variety of research projects, such as phonetic-based Natural Language Understanding, creation of conversational characters from text resources, dialogue policy learning, and research on patterns of user interaction. It has the potential to be used for research on children's speech and on language used when talking to a virtual human.

pdf
Incremental Speech Understanding in a Multi-Party Virtual Human Dialogue System
David DeVault | David Traum
Proceedings of the Demonstration Session at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Proceedings of the SIGDIAL 2011 Conference
Joyce Y. Chai | Johanna D. Moore | Rebecca J. Passonneau | David R. Traum
Proceedings of the SIGDIAL 2011 Conference

pdf
Rapid Development of Advanced Question-Answering Characters by Non-experts
Sudeep Gandhe | Alysa Taylor | Jillian Gerten | David Traum
Proceedings of the SIGDIAL 2011 Conference

2010

pdf
Don’t tell anyone! Two Experiments on Gossip Conversations
Jenny Brusk | Ron Artstein | David Traum
Proceedings of the SIGDIAL 2010 Conference

pdf
I’ve said it before, and I’ll say it again: An empirical investigation of the upper bound of the selection approach to dialogue
Sudeep Gandhe | David Traum
Proceedings of the SIGDIAL 2010 Conference

pdf
Interpretation of Partial Utterances in Virtual Human Dialogue Systems
Kenji Sagae | David DeVault | David Traum
Proceedings of the NAACL HLT 2010 Demonstration Session

This paper describes an ISO project which aims at developing a standard for annotating spoken and multimodal dialogue with semantic information concerning the communicative functions of utterances, the kind of semantic content they address, and their relations with what was said and done earlier in the dialogue. The project, ISO 24617-2 ""Semantic annotation framework, Part 2: Dialogue acts"", is currently at DIS stage. The proposed annotation schema distinguishes 9 orthogonal dimensions, allowing each functional segment in dialogue to have a function in each of these dimensions, thus accounting for the multifunctionality that utterances in dialogue often have. A number of core communicative functions is defined in the form of ISO data categories, available at http://semantic-annotation.uvt.nl/dialogue-acts/iso-datcats.pdf; they are divided into ""dimension-specific"" functions, which can be used only in a particular dimension, such as Turn Accept in the Turn Management dimension, and ""general-purpose"" functions, which can be used in any dimension, such as Inform and Request. An XML-based annotation language, ""DiAML"" is defined, with an abstract syntax, a semantics, and a concrete syntax.

pdf abs
NPCEditor: A Tool for Building Question-Answering Characters
Anton Leuski | David Traum
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

NPCEditor is a system for building and deploying virtual characters capable of engaging a user in spoken dialog on a limited domain. The dialogue may take any form as long as the character responses can be specified a priori. For example, NPCEditor has been used for constructing question answering characters where a user asks questions and the character responds, but other scenarios are possible. At the core of the system is a state of the art statistical language classification technology for mapping from user's text input to system responses. NPCEditor combines the classifier with a database that stores the character information and relevant language data, a server that allows the character designer to deploy the completed characters, and a user-friendly editor that helps the designer to accomplish both character design and deployment tasks. In the paper we define the overall system architecture, describe individual NPCEditor components, and guide the reader through the steps of building a virtual character.

pdf abs
Dialogues in Context: An Objective User-Oriented Evaluation Approach for Virtual Human Dialogue
Susan Robinson | Antonio Roque | David Traum
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

As conversational agents are now being developed to encounter more complex dialogue situations it is increasingly difficult to find satisfactory methods for evaluating these agents. Task-based measures are insufficient where there is no clearly defined task. While user-based evaluation methods may give a general sense of the quality of an agent's performance, they shed little light on the relative quality or success of specific features of dialogue that are necessary for system improvement. This paper examines current dialogue agent evaluation practices and motivates the need for a more detailed approach for defining and measuring the quality of dialogues between agent and user. We present a framework for evaluating the dialogue competence of artificial agents involved in complex and underspecified tasks when conversing with people. A multi-part coding scheme is proposed that provides a qualitative analysis of human utterances, and rates the appropriateness of the agent's responses to these utterances. The scheme is outlined, and then used to evaluate Staff Duty Officer Moleno, a virtual guide in Second Life.

pdf abs
Practical Evaluation of Speech Recognizers for Virtual Human Dialogue Systems
Xuchen Yao | Pravin Bhutada | Kallirroi Georgila | Kenji Sagae | Ron Artstein | David Traum
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We perform a large-scale evaluation of multiple off-the-shelf speech recognizers across diverse domains for virtual human dialogue systems. Our evaluation is aimed at speech recognition consumers and potential consumers with limited experience with readily available recognizers. We focus on practical factors to determine what levels of performance can be expected from different available recognizers in various projects featuring different types of conversational utterances. Our results show that there is no single recognizer that outperforms all other recognizers in all domains. The performance of each recognizer may vary significantly depending on the domain, the size and perplexity of the corpus, the out-of-vocabulary rate, and whether acoustic and language model adaptation has been used or not. We expect that our evaluation will prove useful to other speech recognition consumers, especially in the dialogue community, and will shed some light on the key problem in spoken dialogue systems of selecting the most suitable available speech recognition system for a particular application, and what impact training will have.

2009

pdf
Towards Natural Language Understanding of Partial Speech Recognition Results in Dialogue Systems
Kenji Sagae | Gwen Christian | David DeVault | David Traum
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf
A computational account of comparative implicatures for a spoken dialogue agent
Luciana Benotti | David Traum
Proceedings of the Eight International Conference on Computational Semantics

pdf bib
Can I Finish? Learning When to Respond to Incremental Interpretation Results in Interactive Dialogue
David DeVault | Kenji Sagae | David Traum
Proceedings of the SIGDIAL 2009 Conference

pdf bib
HCSNet Plenary Talk: Spoken Dialogue Models for Virtual Humans
David Traum
Proceedings of the Australasian Language Technology Association Workshop 2009

2008

pdf abs
A Common Ground for Virtual Humans: Using an Ontology in a Natural Language Oriented Virtual Human Architecture
Arno Hartholt | Thomas Russ | David Traum | Eduard Hovy | Susan Robinson
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

When dealing with large, distributed systems that use state-of-the-art components, individual components are usually developed in parallel. As development continues, the decoupling invariably leads to a mismatch between how these components internally represent concepts and how they communicate these representations to other components: representations can get out of synch, contain localized errors, or become manageable only by a small group of experts for each module. In this paper, we describe the use of an ontology as part of a complex distributed virtual human architecture in order to enable better communication between modules while improving the overall flexibility needed to change or extend the system. We focus on the natural language understanding capabilities of this architecture and the relationship between language and concepts within the entire system in general and the ontology in particular.

pdf abs
What would you Ask a conversational Agent? Observations of Human-Agent Dialogues in a Museum Setting
Susan Robinson | David Traum | Midhun Ittycheriah | Joe Henderer
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Embodied Conversational Agents have typically been constructed for use in limited domain applications, and tested in very specialized environments. Only in recent years have there been more cases of moving agents into wider public applications (e.g.Bell et al., 2003; Kopp et al., 2005). Yet little analysis has been done to determine the differing needs, expectations, and behavior of human users in these environments. With an increasing trend for virtual characters to go public, we need to expand our understanding of what this entails for the design and capabilities of our characters. This paper explores these issues through an analysis of a corpus that has been collected since December 2006, from interactions with the virtual character Sgt Blackwell at the Cooper Hewitt Museum in New York. The analysis includes 82 hierarchical categories of user utterances, as well as specific observations on user preferences and behaviors drawn from interactions with Blackwell.

pdf
Degrees of Grounding Based on Evidence of Understanding
Antonio Roque | David Traum
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

pdf
Evaluation Understudy for Dialogue Coherence Models
Sudeep Gandhe | David Traum
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

pdf
Making Grammar-Based Generation Easier to Deploy in Dialogue Systems
David DeVault | David Traum | Ron Artstein
Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue

pdf
Practical Grammar-Based NLG from Examples
David DeVault | David Traum | Ron Artstein
Proceedings of the Fifth International Natural Language Generation Conference

2007

pdf
Dynamic Movement and Positioning of Embodied Agents in Multiparty Conversations
Dušan Jan | David Traum
Proceedings of the Workshop on Embodied Language Processing

pdf
A Model of Compliance and Emotion for Potentially Adversarial Dialogue Agents
Antonio Roque | David Traum
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

2006

pdf
Building Effective Question Answering Characters
Anton Leuski | Ronakkumar Patel | David Traum | Brandon Kennedy
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

pdf
An Information State-Based Dialogue Manager for Call for Fire Dialogues
Antonio Roque | David Traum
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

2005

This paper describes an implemented algorithm for syntactic realization of a target-language sentence from an interlingual representation called Lexical Conceptual Structure (LCS). We provide a mapping between LCS thematic roles and Abstract Meaning Representation (AMR) relations; these relations serve as input to an off-the-shelf generator (Nitrogen). There are two contributions of this work: (1) the development of a thematic hierarchy that provides ordering information for realization of arguments in their surface positions; (2) the provision of a diagnostic tool for detecting inconsistencies in an existing online LCS-based lexicon that allows us to enhance principles for thematic-role assignment.