Stefan Ultes


2021

pdf bib
Naturalness Evaluation of Natural Language Generation in Task-oriented Dialogues Using BERT
Ye Liu | Wolfgang Maier | Wolfgang Minker | Stefan Ultes
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

This paper presents an automatic method to evaluate the naturalness of natural language generation in dialogue systems. While this task was previously rendered through expensive and time-consuming human labor, we present this novel task of automatic naturalness evaluation of generated language. By fine-tuning the BERT model, our proposed naturalness evaluation method shows robust results and outperforms the baselines: support vector machines, bi-directional LSTMs, and BLEURT. In addition, the training speed and evaluation performance of naturalness model are improved by transfer learning from quality and informativeness linguistic knowledge.

pdf bib
From Argument Search to Argumentative Dialogue: A Topic-independent Approach to Argument Acquisition for Dialogue Systems
Niklas Rach | Carolin Schindler | Isabel Feustel | Johannes Daxenberger | Wolfgang Minker | Stefan Ultes
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Despite the remarkable progress in the field of computational argumentation, dialogue systems concerned with argumentative tasks often rely on structured knowledge about arguments and their relations. Since the manual acquisition of these argument structures is highly time-consuming, the corresponding systems are inflexible regarding the topics they can discuss. To address this issue, we propose a combination of argumentative dialogue systems with argument search technology that enables a system to discuss any topic on which the search engine is able to find suitable arguments. Our approach utilizes supervised learning-based relation classification to map the retrieved arguments into a general tree structure for use in dialogue systems. We evaluate the approach with a state of the art search engine and a recently introduced dialogue model in an extensive user study with respect to the dialogue coherence. The results vary between the investigated topics (and hence depend on the quality of the underlying data) but are in some instances surprisingly close to the results achieved with a manually annotated argument structure.

pdf bib
Blending Task Success and User Satisfaction: Analysis of Learned Dialogue Behaviour with Multiple Rewards
Stefan Ultes | Wolfgang Maier
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Recently, principal reward components for dialogue policy reinforcement learning use task success and user satisfaction independently and neither the resulting learned behaviour has been analysed nor a suitable proper analysis method even existed. In this work, we employ both principal reward components jointly and propose a method to analyse the resulting behaviour through a structured way of probing the learned policy. We show that blending both reward components increases user satisfaction without sacrificing task success in more hostile environments and provide insight about actions chosen by the learned policies.

2020

pdf bib
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Olivier Pietquin | Smaranda Muresan | Vivian Chen | Casey Kennington | David Vandyke | Nina Dethlefs | Koji Inoue | Erik Ekstedt | Stefan Ultes
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Similarity Scoring for Dialogue Behaviour Comparison
Stefan Ultes | Wolfgang Maier
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

The differences in decision making between behavioural models of voice interfaces are hard to capture using existing measures for the absolute performance of such models. For instance, two models may have a similar task success rate, but very different ways of getting there. In this paper, we propose a general methodology to compute the similarity of two dialogue behaviour models and investigate different ways of computing scores on both the semantic and the textual level. Complementing absolute measures of performance, we test our scores on three different tasks and show the practical usability of the measures.

pdf bib
Evaluation of Argument Search Approaches in the Context of Argumentative Dialogue Systems
Niklas Rach | Yuki Matsuda | Johannes Daxenberger | Stefan Ultes | Keiichi Yasumoto | Wolfgang Minker
Proceedings of the 12th Language Resources and Evaluation Conference

We present an approach to evaluate argument search techniques in view of their use in argumentative dialogue systems by assessing quality aspects of the retrieved arguments. To this end, we introduce a dialogue system that presents arguments by means of a virtual avatar and synthetic speech to users and allows them to rate the presented content in four different categories (Interesting, Convincing, Comprehensible, Relation). The approach is applied in a user study in order to compare two state of the art argument search engines to each other and with a system based on traditional web search. The results show a significant advantage of the two search engines over the baseline. Moreover, the two search engines show significant advantages over each other in different categories, thereby reflecting strengths and weaknesses of the different underlying techniques.

pdf bib
Estimating User Communication Styles for Spoken Dialogue Systems
Juliana Miehle | Isabel Feustel | Julia Hornauer | Wolfgang Minker | Stefan Ultes
Proceedings of the 12th Language Resources and Evaluation Conference

We present a neural network approach to estimate the communication style of spoken interaction, namely the stylistic variations elaborateness and directness, and investigate which type of input features to the estimator are necessary to achive good performance. First, we describe our annotated corpus of recordings in the health care domain and analyse the corpus statistics in terms of agreement, correlation and reliability of the ratings. We use this corpus to estimate the elaborateness and the directness of each utterance. We test different feature sets consisting of dialogue act features, grammatical features and linguistic features as input for our classifier and perform classification in two and three classes. Our classifiers use only features that can be automatically derived during an ongoing interaction in any spoken dialogue system without any prior annotation. Our results show that the elaborateness can be classified by only using the dialogue act and the amount of words contained in the corresponding utterance. The directness is a more difficult classification task and additional linguistic features in form of word embeddings improve the classification results. Afterwards, we run a comparison with a support vector machine and a recurrent neural network classifier.

pdf bib
Comparative Study of Sentence Embeddings for Contextual Paraphrasing
Louisa Pragst | Wolfgang Minker | Stefan Ultes
Proceedings of the 12th Language Resources and Evaluation Conference

Paraphrasing is an important aspect of natural-language generation that can produce more variety in the way specific content is presented. Traditionally, paraphrasing has been focused on finding different words that convey the same meaning. However, in human-human interaction, we regularly express our intention with phrases that are vastly different regarding both word content and syntactic structure. Instead of exchanging only individual words, the complete surface realisation of a sentences is altered while still preserving its meaning and function in a conversation. This kind of contextual paraphrasing did not yet receive a lot of attention from the scientific community despite its potential for the creation of more varied dialogues. In this work, we evaluate several existing approaches to sentence encoding with regard to their ability to capture such context-dependent paraphrasing. To this end, we define a paraphrase classification task that incorporates contextual paraphrases, perform dialogue act clustering, and determine the performance of the sentence embeddings in a sentence swapping task.

2019

pdf bib
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue
Satoshi Nakamura | Milica Gasic | Ingrid Zuckerman | Gabriel Skantze | Mikio Nakano | Alexandros Papangelis | Stefan Ultes | Koichiro Yoshino
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

pdf bib
Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning
Stefan Ultes
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we use a reward based on user satisfaction estimation. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. Furthermore, we apply this novel user satisfaction estimation model live in simulated experiments where the satisfaction estimation model is trained on one domain and applied in many other domains which cover a similar task. We show that applying this model results in higher estimated satisfaction, similar task success rates and a higher robustness to noise.

2018

pdf bib
MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
Paweł Budzianowski | Tsung-Hsien Wen | Bo-Hsiang Tseng | Iñigo Casanueva | Stefan Ultes | Osman Ramadan | Milica Gašić
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available.To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics.At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora.The contribution of this work apart from the open-sourced dataset is two-fold:firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators;secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies.

pdf bib
Feudal Reinforcement Learning for Dialogue Management in Large Domains
Iñigo Casanueva | Paweł Budzianowski | Pei-Hao Su | Stefan Ultes | Lina M. Rojas-Barahona | Bo-Hsiang Tseng | Milica Gašić
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Reinforcement learning (RL) is a promising approach to solve dialogue policy optimisation. Traditional RL algorithms, however, fail to scale to large domains due to the curse of dimensionality. We propose a novel Dialogue Management architecture, based on Feudal RL, which decomposes the decision into two steps; a first step where a master policy selects a subset of primitive actions, and a second step where a primitive action is chosen from the selected subset. The structural information included in the domain ontology is used to abstract the dialogue state space, taking the decisions at each step using different parts of the abstracted state. This, combined with an information sharing mechanism between slots, increases the scalability to large domains. We show that an implementation of this approach, based on Deep-Q Networks, significantly outperforms previous state of the art in several dialogue domains and environments, without the need of any additional reward signal.

pdf bib
Changing the Level of Directness in Dialogue using Dialogue Vector Models and Recurrent Neural Networks
Louisa Pragst | Stefan Ultes
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

In cooperative dialogues, identifying the intent of ones conversation partner and acting accordingly is of great importance. While this endeavour is facilitated by phrasing intentions as directly as possible, we can observe in human-human communication that a number of factors such as cultural norms and politeness may result in expressing one’s intent indirectly. Therefore, in human-computer communication we have to anticipate the possibility of users being indirect and be prepared to interpret their actual meaning. Furthermore, a dialogue system should be able to conform to human expectations by adjusting the degree of directness it uses to improve the user experience. To reach those goals, we propose an approach to differentiate between direct and indirect utterances and find utterances of the opposite characteristic that express the same intent. In this endeavour, we employ dialogue vector models and recurrent neural networks.

pdf bib
Addressing Objects and Their Relations: The Conversational Entity Dialogue Model
Stefan Ultes | Paweł Budzianowski | Iñigo Casanueva | Lina M. Rojas-Barahona | Bo-Hsiang Tseng | Yen-Chen Wu | Steve Young | Milica Gašić
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Statistical spoken dialogue systems usually rely on a single- or multi-domain dialogue model that is restricted in its capabilities of modelling complex dialogue structures, e.g., relations. In this work, we propose a novel dialogue model that is centred around entities and is able to model relations as well as multiple entities of the same type. We demonstrate in a prototype implementation benefits of relation modelling on the dialogue level and show that a trained policy using these relations outperforms the multi-domain baseline. Furthermore, we show that by modelling the relations on the dialogue level, the system is capable of processing relations present in the user input and even learns to address them in the system response.

pdf bib
Feudal Dialogue Management with Jointly Learned Feature Extractors
Iñigo Casanueva | Paweł Budzianowski | Stefan Ultes | Florian Kreyssig | Bo-Hsiang Tseng | Yen-chen Wu | Milica Gašić
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Reinforcement learning (RL) is a promising dialogue policy optimisation approach, but traditional RL algorithms fail to scale to large domains. Recently, Feudal Dialogue Management (FDM), has shown to increase the scalability to large domains by decomposing the dialogue management decision into two steps, making use of the domain ontology to abstract the dialogue state in each step. In order to abstract the state space, however, previous work on FDM relies on handcrafted feature functions. In this work, we show that these feature functions can be learned jointly with the policy model while obtaining similar performance, even outperforming the handcrafted features in several environments and domains.

pdf bib
Variational Cross-domain Natural Language Generation for Spoken Dialogue Systems
Bo-Hsiang Tseng | Florian Kreyssig | Paweł Budzianowski | Iñigo Casanueva | Yen-Chen Wu | Stefan Ultes | Milica Gašić
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

Cross-domain natural language generation (NLG) is still a difficult task within spoken dialogue modelling. Given a semantic representation provided by the dialogue manager, the language generator should generate sentences that convey desired information. Traditional template-based generators can produce sentences with all necessary information, but these sentences are not sufficiently diverse. With RNN-based models, the diversity of the generated sentences can be high, however, in the process some information is lost. In this work, we improve an RNN-based generator by considering latent information at the sentence level during generation using conditional variational auto-encoder architecture. We demonstrate that our model outperforms the original RNN-based generator, while yielding highly diverse sentences. In addition, our model performs better when the training data is limited.

pdf bib
Deep learning for language understanding of mental health concepts derived from Cognitive Behavioural Therapy
Lina M. Rojas-Barahona | Bo-Hsiang Tseng | Yinpei Dai | Clare Mansfield | Osman Ramadan | Stefan Ultes | Michael Crawford | Milica Gašić
Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis

In recent years, we have seen deep learning and distributed representations of words and sentences make impact on a number of natural language processing tasks, such as similarity, entailment and sentiment analysis. Here we introduce a new task: understanding of mental health concepts derived from Cognitive Behavioural Therapy (CBT). We define a mental health ontology based on the CBT principles, annotate a large corpus where this phenomena is exhibited and perform understanding using deep learning and distributed representations. Our results show that the performance of deep learning models combined with word embeddings or sentence embeddings significantly outperform non-deep-learning models in this difficult task. This understanding module will be an essential component of a statistical dialogue system delivering therapy.

pdf bib
Expert Evaluation of a Spoken Dialogue System in a Clinical Operating Room
Juliana Miehle | Nadine Gerstenlauer | Daniel Ostler | Hubertus Feußner | Wolfgang Minker | Stefan Ultes
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
On the Vector Representation of Utterances in Dialogue Context
Louisa Pragst | Niklas Rach | Wolfgang Minker | Stefan Ultes
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
What Causes the Differences in Communication Styles? A Multicultural Study on Directness and Elaborateness
Juliana Miehle | Wolfgang Minker | Stefan Ultes
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Acquisition and Assessment of Semantic Content for the Generation of Elaborateness and Indirectness in Spoken Dialogue Systems
Louisa Pragst | Koichiro Yoshino | Wolfgang Minker | Satoshi Nakamura | Stefan Ultes
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In a dialogue system, the dialogue manager selects one of several system actions and thereby determines the system’s behaviour. Defining all possible system actions in a dialogue system by hand is a tedious work. While efforts have been made to automatically generate such system actions, those approaches are mostly focused on providing functional system behaviour. Adapting the system behaviour to the user becomes a difficult task due to the limited amount of system actions available. We aim to increase the adaptability of a dialogue system by automatically generating variants of system actions. In this work, we introduce an approach to automatically generate action variants for elaborateness and indirectness. Our proposed algorithm extracts RDF triplets from a knowledge base and rates their relevance to the original system action to find suitable content. We show that the results of our algorithm are mostly perceived similarly to human generated elaborateness and indirectness and can be used to adapt a conversation to the current user and situation. We also discuss where the results of our algorithm are still lacking and how this could be improved: Taking into account the conversation topic as well as the culture of the user is likely to have beneficial effect on the user’s perception.

pdf bib
PyDial: A Multi-domain Statistical Dialogue System Toolkit
Stefan Ultes | Lina M. Rojas-Barahona | Pei-Hao Su | David Vandyke | Dongho Kim | Iñigo Casanueva | Paweł Budzianowski | Nikola Mrkšić | Tsung-Hsien Wen | Milica Gašić | Steve Young
Proceedings of ACL 2017, System Demonstrations

pdf bib
Reward-Balancing for Statistical Spoken Dialogue Systems using Multi-objective Reinforcement Learning
Stefan Ultes | Paweł Budzianowski | Iñigo Casanueva | Nikola Mrkšić | Lina M. Rojas-Barahona | Pei-Hao Su | Tsung-Hsien Wen | Milica Gašić | Steve Young
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Reinforcement learning is widely used for dialogue policy optimization where the reward function often consists of more than one component, e.g., the dialogue success and the dialogue length. In this work, we propose a structured method for finding a good balance between these components by searching for the optimal reward component weighting. To render this search feasible, we use multi-objective reinforcement learning to significantly reduce the number of training dialogues required. We apply our proposed method to find optimized component weights for six domains and compare them to a default baseline.

pdf bib
Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning
Paweł Budzianowski | Stefan Ultes | Pei-Hao Su | Nikola Mrkšić | Tsung-Hsien Wen | Iñigo Casanueva | Lina M. Rojas-Barahona | Milica Gašić
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Human conversation is inherently complex, often spanning many different topics/domains. This makes policy learning for dialogue systems very challenging. Standard flat reinforcement learning methods do not provide an efficient framework for modelling such dialogues. In this paper, we focus on the under-explored problem of multi-domain dialogue management. First, we propose a new method for hierarchical reinforcement learning using the option framework. Next, we show that the proposed architecture learns faster and arrives at a better policy than the existing flat ones do. Moreover, we show how pretrained policies can be adapted to more complex systems with an additional set of new actions. In doing that, we show that our approach has the potential to facilitate policy optimisation for more sophisticated multi-domain dialogue systems.

pdf bib
Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management
Pei-Hao Su | Paweł Budzianowski | Stefan Ultes | Milica Gašić | Steve Young
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Deep reinforcement learning (RL) methods have significant potential for dialogue policy optimisation. However, they suffer from a poor performance in the early stages of learning. This is especially problematic for on-line learning with real users. Two approaches are introduced to tackle this problem. Firstly, to speed up the learning process, two sample-efficient neural networks algorithms: trust region actor-critic with experience replay (TRACER) and episodic natural actor-critic with experience replay (eNACER) are presented. For TRACER, the trust region helps to control the learning step size and avoid catastrophic model changes. For eNACER, the natural gradient identifies the steepest ascent direction in policy space to speed up the convergence. Both models employ off-policy learning with experience replay to improve sample-efficiency. Secondly, to mitigate the cold start issue, a corpus of demonstration data is utilised to pre-train the models prior to on-line reinforcement learning. Combining these two approaches, we demonstrate a practical approach to learn deep RL-based dialogue policies and demonstrate their effectiveness in a task-oriented information seeking domain.

pdf bib
Interaction Quality Estimation Using Long Short-Term Memories
Niklas Rach | Wolfgang Minker | Stefan Ultes
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

For estimating the Interaction Quality (IQ) in Spoken Dialogue Systems (SDS), the dialogue history is of significant importance. Previous works included this information manually in the form of precomputed temporal features into the classification process. Here, we employ a deep learning architecture based on Long Short-Term Memories (LSTM) to extract this information automatically from the data, thus estimating IQ solely by using current exchange features. We show that it is thereby possible to achieve competitive results as in a scenario where manually optimized temporal features have been included.

pdf bib
DialPort, Gone Live: An Update After A Year of Development
Kyusong Lee | Tiancheng Zhao | Yulun Du | Edward Cai | Allen Lu | Eli Pincus | David Traum | Stefan Ultes | Lina M. Rojas-Barahona | Milica Gasic | Steve Young | Maxine Eskenazi
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

DialPort collects user data for connected spoken dialog systems. At present six systems are linked to a central portal that directs the user to the applicable system and suggests systems that the user may be interested in. User data has started to flow into the system.

pdf bib
A Network-based End-to-End Trainable Task-oriented Dialogue System
Tsung-Hsien Wen | David Vandyke | Nikola Mrkšić | Milica Gašić | Lina M. Rojas-Barahona | Pei-Hao Su | Stefan Ultes | Steve Young
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Teaching machines to accomplish tasks by conversing naturally with humans is challenging. Currently, developing task-oriented dialogue systems requires creating multiple components and typically this involves either a large amount of handcrafting, or acquiring costly labelled datasets to solve a statistical learning problem for each component. In this work we introduce a neural network-based text-in, text-out end-to-end trainable goal-oriented dialogue system along with a new way of collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework. This approach allows us to develop dialogue systems easily and without making too many assumptions about the task at hand. The results show that the model can converse with human subjects naturally whilst helping them to accomplish tasks in a restaurant search domain.

2016

pdf bib
Cultural Communication Idiosyncrasies in Human-Computer Interaction
Juliana Miehle | Koichiro Yoshino | Louisa Pragst | Stefan Ultes | Satoshi Nakamura | Wolfgang Minker
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Automatic Modification of Communication Style in Dialogue Management
Louisa Pragst | Juliana Miehle | Stefan Ultes | Wolfgang Minker
Proceedings of the INLG 2016 Workshop on Computational Creativity in Natural Language Generation

pdf bib
Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding
Lina M. Rojas-Barahona | Milica Gašić | Nikola Mrkšić | Pei-Hao Su | Stefan Ultes | Tsung-Hsien Wen | Steve Young
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper presents a deep learning architecture for the semantic decoder component of a Statistical Spoken Dialogue System. In a slot-filling dialogue, the semantic decoder predicts the dialogue act and a set of slot-value pairs from a set of n-best hypotheses returned by the Automatic Speech Recognition. Most current models for spoken language understanding assume (i) word-aligned semantic annotations as in sequence taggers and (ii) delexicalisation, or a mapping of input words to domain-specific concepts using heuristics that try to capture morphological variation but that do not scale to other domains nor to language variation (e.g., morphology, synonyms, paraphrasing ). In this work the semantic decoder is trained using unaligned semantic annotations and it uses distributed semantic representation learning to overcome the limitations of explicit delexicalisation. The proposed architecture uses a convolutional neural network for the sentence representation and a long-short term memory network for the context representation. Results are presented for the publicly available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a significantly higher word error rate (WER).

pdf bib
Conditional Generation and Snapshot Learning in Neural Dialogue Systems
Tsung-Hsien Wen | Milica Gašić | Nikola Mrkšić | Lina M. Rojas-Barahona | Pei-Hao Su | Stefan Ultes | David Vandyke | Steve Young
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems
Pei-Hao Su | Milica Gašić | Nikola Mrkšić | Lina M. Rojas-Barahona | Stefan Ultes | David Vandyke | Tsung-Hsien Wen | Steve Young
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Quality-adaptive Spoken Dialogue Initiative Selection And Implications On Reward Modelling
Stefan Ultes | Matthias Kraus | Alexander Schmitt | Wolfgang Minker
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
First Insight into Quality-Adaptive Dialogue
Stefan Ultes | Hüseyin Dikme | Wolfgang Minker
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

While Spoken Dialogue Systems have gained in importance in recent years, most systems applied in the real world are still static and error-prone. To overcome this, the user is put into the focus of dialogue management. Hence, an approach for adapting the course of the dialogue to Interaction Quality, an objective variant of user satisfaction, is presented in this work. In general, rendering the dialogue adaptive to user satisfaction enables the dialogue system to improve the course of the dialogue and to handle problematic situations better. In this contribution, we present a pilot study of quality-adaptive dialogue. By selecting the confirmation strategy based on the current IQ value, the course of the dialogue is adapted in order to improve the overall user experience. In a user experiment comparing three different confirmation strategies in a train booking domain, the adaptive strategy performs successful and is among the two best rated strategies based on the overall user experience.

pdf bib
Comparison of Gender- and Speaker-adaptive Emotion Recognition
Maxim Sidorov | Stefan Ultes | Alexander Schmitt
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Deriving the emotion of a human speaker is a hard task, especially if only the audio stream is taken into account. While state-of-the-art approaches already provide good results, adaptive methods have been proposed in order to further improve the recognition accuracy. A recent approach is to add characteristics of the speaker, e.g., the gender of the speaker. In this contribution, we argue that adding information unique for each speaker, i.e., by using speaker identification techniques, improves emotion recognition simply by adding this additional information to the feature vector of the statistical classification algorithm. Moreover, we compare this approach to emotion recognition adding only the speaker gender being a non-unique speaker attribute. We justify this by performing adaptive emotion recognition using both gender and speaker information on four different corpora of different languages containing acted and non-acted speech. The final results show that adding speaker information significantly outperforms both adding gender information and solely using a generic speaker-independent approach.

pdf bib
Interaction Quality Estimation in Spoken Dialogue Systems Using Hybrid-HMMs
Stefan Ultes | Wolfgang Minker
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2013

pdf bib
On Quality Ratings for Spoken Dialogue Systems – Experts vs. Users
Stefan Ultes | Alexander Schmitt | Wolfgang Minker
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improving Interaction Quality Recognition Using Error Correction
Stefan Ultes | Wolfgang Minker
Proceedings of the SIGDIAL 2013 Conference

2012

pdf bib
Towards Quality-Adaptive Spoken Dialogue Management
Stefan Ultes | Alexander Schmitt | Wolfgang Minker
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)

pdf bib
A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System
Alexander Schmitt | Stefan Ultes | Wolfgang Minker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Standardized corpora are the foundation for spoken language research. In this work, we introduce an annotated and standardized corpus in the Spoken Dialog Systems (SDS) domain. Data from the Let's Go Bus Information System from the Carnegie Mellon University in Pittsburgh has been formatted, parameterized and annotated with quality, emotion, and task success labels containing 347 dialogs with 9,083 system-user exchanges. A total of 46 parameters have been derived automatically and semi-automatically from Automatic Speech Recognition (ASR), Spoken Language Understanding (SLU) and Dialog Manager (DM) properties. To each spoken user utterance an emotion label from the set garbage, non-angry, slightly angry, very angry has been assigned. In addition, a manual annotation of Interaction Quality (IQ) on the exchange level has been performed with three raters achieving a Kappa value of 0.54. The IQ score expresses the quality of the interaction up to each system-user exchange on a score from 1-5. The presented corpus is intended as a standardized basis for classification and evaluation tasks regarding task success prediction, dialog quality estimation or emotion recognition to foster comparability between different approaches on these fields.