Wolfgang Minker

2021

pdf bib abs
Naturalness Evaluation of Natural Language Generation in Task-oriented Dialogues Using BERT
Ye Liu | Wolfgang Maier | Wolfgang Minker | Stefan Ultes
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

This paper presents an automatic method to evaluate the naturalness of natural language generation in dialogue systems. While this task was previously rendered through expensive and time-consuming human labor, we present this novel task of automatic naturalness evaluation of generated language. By fine-tuning the BERT model, our proposed naturalness evaluation method shows robust results and outperforms the baselines: support vector machines, bi-directional LSTMs, and BLEURT. In addition, the training speed and evaluation performance of naturalness model are improved by transfer learning from quality and informativeness linguistic knowledge.

pdf bib abs
From Argument Search to Argumentative Dialogue: A Topic-independent Approach to Argument Acquisition for Dialogue Systems
Niklas Rach | Carolin Schindler | Isabel Feustel | Johannes Daxenberger | Wolfgang Minker | Stefan Ultes
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Despite the remarkable progress in the field of computational argumentation, dialogue systems concerned with argumentative tasks often rely on structured knowledge about arguments and their relations. Since the manual acquisition of these argument structures is highly time-consuming, the corresponding systems are inflexible regarding the topics they can discuss. To address this issue, we propose a combination of argumentative dialogue systems with argument search technology that enables a system to discuss any topic on which the search engine is able to find suitable arguments. Our approach utilizes supervised learning-based relation classification to map the retrieved arguments into a general tree structure for use in dialogue systems. We evaluate the approach with a state of the art search engine and a recently introduced dialogue model in an extensive user study with respect to the dialogue coherence. The results vary between the investigated topics (and hence depend on the quality of the underlying data) but are in some instances surprisingly close to the results achieved with a manually annotated argument structure.

2020

pdf bib abs
A Comparison of Explicit and Implicit Proactive Dialogue Strategies for Conversational Recommendation
Matthias Kraus | Fabian Fischbach | Pascal Jansen | Wolfgang Minker
Proceedings of the 12th Language Resources and Evaluation Conference

Recommendation systems aim at facilitating information retrieval for users by taking into account their preferences. Based on previous user behaviour, such a system suggests items or provides information that a user might like or find useful. Nonetheless, how to provide suggestions is still an open question. Depending on the way a recommendation is communicated influences the user’s perception of the system. This paper presents an empirical study on the effects of proactive dialogue strategies on user acceptance. Therefore, an explicit strategy based on user preferences provided directly by the user, and an implicit proactive strategy, using autonomously gathered information, are compared. The results show that proactive dialogue systems significantly affect the perception of human-computer interaction. Although no significant differences are found between implicit and explicit strategies, proactivity significantly influences the user experience compared to reactive system behaviour. The study contributes new insights to the human-agent interaction and the voice user interface design. Furthermore, we discover interesting tendencies that motivate futurework.

pdf bib abs
How Users React to Proactive Voice Assistant Behavior While Driving
Maria Schmidt | Wolfgang Minker | Steffen Werner
Proceedings of the 12th Language Resources and Evaluation Conference

Nowadays Personal Assistants (PAs) are available in multiple environments and become increasingly popular to use via voice. Therefore, we aim to provide proactive PA suggestions to car drivers via speech. These suggestions should be neither obtrusive nor increase the drivers’ cognitive load, while enhancing user experience. To assess these factors, we conducted a usability study in which 42 participants perceive proactive voice output in a Wizard-of-Oz study in a driving simulator. Traffic density was varied during a highway drive and it included six in-car-specific use cases. The latter were presented by a proactive voice assistant and in a non-proactive control condition. We assessed the users’ subjective cognitive load and their satisfaction in different questionnaires during the interaction with both PA variants. Furthermore, we analyze the user reactions: both regarding their content and the elapsed response times to PA actions. The results show that proactive assistant behavior is rated similarly positive as non-proactive behavior. Furthermore, the participants agreed to 73.8% of proactive suggestions. In line with previous research, driving-relevant use cases receive the best ratings, here we reach 82.5% acceptance. Finally, the users reacted significantly faster to proactive PA actions, which we interpret as less cognitive load compared to non-proactive behavior.

pdf bib abs
Evaluation of Argument Search Approaches in the Context of Argumentative Dialogue Systems
Niklas Rach | Yuki Matsuda | Johannes Daxenberger | Stefan Ultes | Keiichi Yasumoto | Wolfgang Minker
Proceedings of the 12th Language Resources and Evaluation Conference

We present an approach to evaluate argument search techniques in view of their use in argumentative dialogue systems by assessing quality aspects of the retrieved arguments. To this end, we introduce a dialogue system that presents arguments by means of a virtual avatar and synthetic speech to users and allows them to rate the presented content in four different categories (Interesting, Convincing, Comprehensible, Relation). The approach is applied in a user study in order to compare two state of the art argument search engines to each other and with a system based on traditional web search. The results show a significant advantage of the two search engines over the baseline. Moreover, the two search engines show significant advantages over each other in different categories, thereby reflecting strengths and weaknesses of the different underlying techniques.

pdf bib abs
Estimating User Communication Styles for Spoken Dialogue Systems
Juliana Miehle | Isabel Feustel | Julia Hornauer | Wolfgang Minker | Stefan Ultes
Proceedings of the 12th Language Resources and Evaluation Conference

We present a neural network approach to estimate the communication style of spoken interaction, namely the stylistic variations elaborateness and directness, and investigate which type of input features to the estimator are necessary to achive good performance. First, we describe our annotated corpus of recordings in the health care domain and analyse the corpus statistics in terms of agreement, correlation and reliability of the ratings. We use this corpus to estimate the elaborateness and the directness of each utterance. We test different feature sets consisting of dialogue act features, grammatical features and linguistic features as input for our classifier and perform classification in two and three classes. Our classifiers use only features that can be automatically derived during an ongoing interaction in any spoken dialogue system without any prior annotation. Our results show that the elaborateness can be classified by only using the dialogue act and the amount of words contained in the corresponding utterance. The directness is a more difficult classification task and additional linguistic features in form of word embeddings improve the classification results. Afterwards, we run a comparison with a support vector machine and a recurrent neural network classifier.

pdf bib abs
Comparative Study of Sentence Embeddings for Contextual Paraphrasing
Louisa Pragst | Wolfgang Minker | Stefan Ultes
Proceedings of the 12th Language Resources and Evaluation Conference

Paraphrasing is an important aspect of natural-language generation that can produce more variety in the way specific content is presented. Traditionally, paraphrasing has been focused on finding different words that convey the same meaning. However, in human-human interaction, we regularly express our intention with phrases that are vastly different regarding both word content and syntactic structure. Instead of exchanging only individual words, the complete surface realisation of a sentences is altered while still preserving its meaning and function in a conversation. This kind of contextual paraphrasing did not yet receive a lot of attention from the scientific community despite its potential for the creation of more varied dialogues. In this work, we evaluate several existing approaches to sentence encoding with regard to their ability to capture such context-dependent paraphrasing. To this end, we define a paraphrase classification task that incorporates contextual paraphrases, perform dialogue act clustering, and determine the performance of the sentence embeddings in a sentence swapping task.

2019

pdf bib abs
Cross-Corpus Data Augmentation for Acoustic Addressee Detection
Oleg Akhtiamov | Ingo Siegert | Alexey Karpov | Wolfgang Minker
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Acoustic addressee detection (AD) is a modern paralinguistic and dialogue challenge that especially arises in voice assistants. In the present study, we distinguish addressees in two settings (a conversation between several people and a spoken dialogue system, and a conversation between several adults and a child) and introduce the first competitive baseline (unweighted average recall equals 0.891) for the Voice Assistant Conversation Corpus that models the first setting. We jointly solve both classification problems, using three models: a linear support vector machine dealing with acoustic functionals and two neural networks utilising raw waveforms alongside with acoustic low-level descriptors. We investigate how different corpora influence each other, applying the mixup approach to data augmentation. We also study the influence of various acoustic context lengths on AD. Two-second speech fragments turn out to be sufficient for reliable AD. Mixup is shown to be beneficial for merging acoustic data (extracted features but not raw waveforms) from different domains that allows us to reach a higher classification performance on human-machine AD and also for training a multipurpose neural network that is capable of solving both human-machine and adult-child AD problems.

2018

pdf bib
Effects of Gender Stereotypes on Trust and Likability in Spoken Human-Robot Interaction
Matthias Kraus | Johannes Kraus | Martin Baumann | Wolfgang Minker
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Expert Evaluation of a Spoken Dialogue System in a Clinical Operating Room
Juliana Miehle | Nadine Gerstenlauer | Daniel Ostler | Hubertus Feußner | Wolfgang Minker | Stefan Ultes
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
On the Vector Representation of Utterances in Dialogue Context
Louisa Pragst | Niklas Rach | Wolfgang Minker | Stefan Ultes
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Contextual Dependencies in Time-Continuous Multidimensional Affect Recognition
Dmitrii Fedotov | Denis Ivanko | Maxim Sidorov | Wolfgang Minker
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
What Causes the Differences in Communication Styles? A Multicultural Study on Directness and Elaborateness
Juliana Miehle | Wolfgang Minker | Stefan Ultes
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib abs
Acquisition and Assessment of Semantic Content for the Generation of Elaborateness and Indirectness in Spoken Dialogue Systems
Louisa Pragst | Koichiro Yoshino | Wolfgang Minker | Satoshi Nakamura | Stefan Ultes
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In a dialogue system, the dialogue manager selects one of several system actions and thereby determines the system’s behaviour. Defining all possible system actions in a dialogue system by hand is a tedious work. While efforts have been made to automatically generate such system actions, those approaches are mostly focused on providing functional system behaviour. Adapting the system behaviour to the user becomes a difficult task due to the limited amount of system actions available. We aim to increase the adaptability of a dialogue system by automatically generating variants of system actions. In this work, we introduce an approach to automatically generate action variants for elaborateness and indirectness. Our proposed algorithm extracts RDF triplets from a knowledge base and rates their relevance to the original system action to find suitable content. We show that the results of our algorithm are mostly perceived similarly to human generated elaborateness and indirectness and can be used to adapt a conversation to the current user and situation. We also discuss where the results of our algorithm are still lacking and how this could be improved: Taking into account the conversation topic as well as the culture of the user is likely to have beneficial effect on the user’s perception.

pdf bib abs
Interaction Quality Estimation Using Long Short-Term Memories
Niklas Rach | Wolfgang Minker | Stefan Ultes
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

For estimating the Interaction Quality (IQ) in Spoken Dialogue Systems (SDS), the dialogue history is of significant importance. Previous works included this information manually in the form of precomputed temporal features into the classification process. Here, we employ a deep learning architecture based on Long Short-Term Memories (LSTM) to extract this information automatically from the data, thus estimating IQ solely by using current exchange features. We show that it is thereby possible to achieve competitive results as in a scenario where manually optimized temporal features have been included.

2016

pdf bib
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Raquel Fernandez | Wolfgang Minker | Giuseppe Carenini | Ryuichiro Higashinaka | Ron Artstein | Alesia Gainer
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Cultural Communication Idiosyncrasies in Human-Computer Interaction
Juliana Miehle | Koichiro Yoshino | Louisa Pragst | Stefan Ultes | Satoshi Nakamura | Wolfgang Minker
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Automatic Modification of Communication Style in Dialogue Management
Louisa Pragst | Juliana Miehle | Stefan Ultes | Wolfgang Minker
Proceedings of the INLG 2016 Workshop on Computational Creativity in Natural Language Generation

pdf bib abs
Could Speaker, Gender or Age Awareness be beneficial in Speech-based Emotion Recognition?
Maxim Sidorov | Alexander Schmitt | Eugene Semenkin | Wolfgang Minker
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Emotion Recognition (ER) is an important part of dialogue analysis which can be used in order to improve the quality of Spoken Dialogue Systems (SDSs). The emotional hypothesis of the current response of an end-user might be utilised by the dialogue manager component in order to change the SDS strategy which could result in a quality enhancement. In this study additional speaker-related information is used to improve the performance of the speech-based ER process. The analysed information is the speaker identity, gender and age of a user. Two schemes are described here, namely, using additional information as an independent variable within the feature vector and creating separate emotional models for each speaker, gender or age-cluster independently. The performances of the proposed approaches were compared against the baseline ER system, where no additional information has been used, on a number of emotional speech corpora of German, English, Japanese and Russian. The study revealed that for some of the corpora the proposed approach significantly outperforms the baseline methods with a relative difference of up to 11.9%.

pdf bib abs
A Comparative Study of Text Preprocessing Approaches for Topic Detection of User Utterances
Roman Sergienko | Muhammad Shan | Wolfgang Minker
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The paper describes a comparative study of existing and novel text preprocessing and classification techniques for domain detection of user utterances. Two corpora are considered. The first one contains customer calls to a call centre for further call routing; the second one contains answers of call centre employees with different kinds of customer orientation behaviour. Seven different unsupervised and supervised term weighting methods were applied. The collective use of term weighting methods is proposed for classification effectiveness improvement. Four different dimensionality reduction methods were applied: stop-words filtering with stemming, feature selection based on term weights, feature transformation based on term clustering, and a novel feature transformation method based on terms belonging to classes. As classification algorithms we used k-NN and a SVM-based algorithm. The numerical experiments have shown that the simultaneous use of the novel proposed approaches (collectives of term weighting methods and the novel feature transformation method) allows reaching the high classification results with very small number of features.

2015

pdf bib
The Interplay of User-Centered Dialog Systems and AI Planning
Florian Nothdurft | Gregor Behnke | Pascal Bercher | Susanne Biundo | Wolfgang Minker
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Quality-adaptive Spoken Dialogue Initiative Selection And Implications On Reward Modelling
Stefan Ultes | Matthias Kraus | Alexander Schmitt | Wolfgang Minker
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib abs
First Insight into Quality-Adaptive Dialogue
Stefan Ultes | Hüseyin Dikme | Wolfgang Minker
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

While Spoken Dialogue Systems have gained in importance in recent years, most systems applied in the real world are still static and error-prone. To overcome this, the user is put into the focus of dialogue management. Hence, an approach for adapting the course of the dialogue to Interaction Quality, an objective variant of user satisfaction, is presented in this work. In general, rendering the dialogue adaptive to user satisfaction enables the dialogue system to improve the course of the dialogue and to handle problematic situations better. In this contribution, we present a pilot study of quality-adaptive dialogue. By selecting the confirmation strategy based on the current IQ value, the course of the dialogue is adapted in order to improve the overall user experience. In a user experiment comparing three different confirmation strategies in a train booking domain, the adaptive strategy performs successful and is among the two best rated strategies based on the overall user experience.

pdf bib abs
Speech-Based Emotion Recognition: Feature Selection by Self-Adaptive Multi-Criteria Genetic Algorithm
Maxim Sidorov | Christina Brester | Wolfgang Minker | Eugene Semenkin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Automated emotion recognition has a number of applications in Interactive Voice Response systems, call centers, etc. While employing existing feature sets and methods for automated emotion recognition has already achieved reasonable results, there is still a lot to do for improvement. Meanwhile, an optimal feature set, which should be used to represent speech signals for performing speech-based emotion recognition techniques, is still an open question. In our research, we tried to figure out the most essential features with self-adaptive multi-objective genetic algorithm as a feature selection technique and a probabilistic neural network as a classifier. The proposed approach was evaluated using a number of multi-languages databases (English, German), which were represented by 37- and 384-dimensional feature sets. According to the obtained results, the developed technique allows to increase the emotion recognition performance by up to 26.08 relative improvement in accuracy. Moreover, emotion recognition performance scores for all applied databases are improved.

pdf bib
Opinion Mining and Topic Categorization with Novel Term Weighting
Tatiana Gasanova | Roman Sergienko | Shakhnaz Akhmedova | Eugene Semenkin | Wolfgang Minker
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Probabilistic Human-Computer Trust Handling
Florian Nothdurft | Felix Richter | Wolfgang Minker
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf bib
Interaction Quality Estimation in Spoken Dialogue Systems Using Hybrid-HMMs
Stefan Ultes | Wolfgang Minker
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2013

pdf bib
On Quality Ratings for Spoken Dialogue Systems – Experts vs. Users
Stefan Ultes | Alexander Schmitt | Wolfgang Minker
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Improving Interaction Quality Recognition Using Error Correction
Stefan Ultes | Wolfgang Minker
Proceedings of the SIGDIAL 2013 Conference

pdf bib
A Semi-supervised Approach for Natural Language Call Routing
Tatiana Gasanova | Eugene Zhukov | Roman Sergienko | Eugene Semenkin | Wolfgang Minker
Proceedings of the SIGDIAL 2013 Conference

2012

pdf bib
Estimating Adaptation of Dialogue Partners with Different Verbal Intelligence
Kseniya Zablotskaya | Fernando Fernández-Martínez | Wolfgang Minker
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Towards Quality-Adaptive Spoken Dialogue Management
Stefan Ultes | Alexander Schmitt | Wolfgang Minker
NAACL-HLT Workshop on Future directions and needs in the Spoken Dialog Community: Tools and Data (SDCTD 2012)

pdf bib abs
Adaptive Speech Understanding for Intuitive Model-based Spoken Dialogues
Tobias Heinroth | Maximilian Grotz | Florian Nothdurft | Wolfgang Minker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we present three approaches towards adaptive speech understanding. The target system is a model-based Adaptive Spoken Dialogue Manager, the OwlSpeak ASDM. We enhanced this system in order to properly react on non-understandings in real-life situations where intuitive communication is required. OwlSpeak provides a model-based spoken interface to an Intelligent Environment depending on and adapting to the current context. It utilises a set of ontologies used as dialogue models that can be combined dynamically during runtime. Besides the benefits the system showed in practice, real-life evaluations also conveyed some limitations of the model-based approach. Since it is unfeasible to model all variations of the communication between the user and the system beforehand, various situations where the system did not correctly understand the user input have been observed. Thus we present three enhancements towards a more sophisticated use of the ontology-based dialogue models and show how grammars may dynamically be adapted in order to understand intuitive user utterances. The evaluation of our approaches revealed the incorporation of a lexical-semantic knowledgebase into the recognition process to be the most promising approach.

pdf bib abs
Using multimodal resources for explanation approaches in intelligent systems
Florian Nothdurft | Wolfgang Minker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this work we show that there is a need of using multimodal resources during human-computer interaction (HCI) in intelligent systems. We propose that not only creating multimodal output for the user is important, but to take multimodal input resources into account for the decision when and how to interact. Especially the use of multimodal input resources for the decision when and how to provide assistance in HCI is important. The use of assistive functionalities like providing adaptive explanations to keep the user motivated and cooperative is more than a side-effect and demands a closer look. In this paper we introduce our approach on how to use multimodal input ressources in an adaptive and generic explanation pipeline. We do not only concentrate on using explanations as a way to manage user knowledge, but to maintain the cooperativeness, trust and motivation of the user to continue a healthy and well-structured HCI.

pdf bib abs
A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System
Alexander Schmitt | Stefan Ultes | Wolfgang Minker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Standardized corpora are the foundation for spoken language research. In this work, we introduce an annotated and standardized corpus in the Spoken Dialog Systems (SDS) domain. Data from the Let's Go Bus Information System from the Carnegie Mellon University in Pittsburgh has been formatted, parameterized and annotated with quality, emotion, and task success labels containing 347 dialogs with 9,083 system-user exchanges. A total of 46 parameters have been derived automatically and semi-automatically from Automatic Speech Recognition (ASR), Spoken Language Understanding (SLU) and Dialog Manager (DM) properties. To each spoken user utterance an emotion label from the set garbage, non-angry, slightly angry, very angry has been assigned. In addition, a manual annotation of Interaction Quality (IQ) on the exchange level has been performed with three raters achieving a Kappa value of 0.54. The IQ score expresses the quality of the interaction up to each system-user exchange on a score from 1-5. The presented corpus is intended as a standardized basis for classification and evaluation tasks regarding task success prediction, dialog quality estimation or emotion recognition to foster comparability between different approaches on these fields.

pdf bib abs
Investigating Verbal Intelligence Using the TF-IDF Approach
Kseniya Zablotskaya | Fernando Fernández Martínez | Wolfgang Minker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we investigated differences in language use of speakers yielding different verbal intelligence when they describe the same event. The work is based on a corpus containing descriptions of a short film and verbal intelligence scores of the speakers. For analyzing the monologues and the film transcript, the number of reused words, lemmas, n-grams, cosine similarity and other features were calculated and compared to each other for different verbal intelligence groups. The results showed that the similarity of monologues of higher verbal intelligence speakers was greater than of lower and average verbal intelligence participants. A possible explanation of this phenomenon is that candidates yielding higher verbal intelligence have a good short-term memory. In this paper we also checked a hypothesis that differences in vocabulary of speakers yielding different verbal intelligence are sufficient enough for good classification results. For proving this hypothesis, the Nearest Neighbor classifier was trained using TF-IDF vocabulary measures. The maximum achieved accuracy was 92.86%.

pdf bib abs
Relating Dominance of Dialogue Participants with their Verbal Intelligence Scores
Kseniya Zablotskaya | Umair Rahim | Fernando Fernández Martínez | Wolfgang Minker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this work we investigated whether there is a relationship between dominant behaviour of dialogue participants and their verbal intelligence. The analysis is based on a corpus containing 56 dialogues and verbal intelligence scores of the test persons. All the dialogues were divided into three groups: H-H is a group of dialogues between higher verbal intelligence participants, L-L is a group of dialogues between lower verbal intelligence participant and L-H is a group of all the other dialogues. The dominance scores of the dialogue partners from each group were analysed. The analysis showed that differences between dominance scores and verbal intelligence coefficients for L-L were positively correlated. Verbal intelligence scores of the test persons were compared to other features that may reflect dominant behaviour. The analysis showed that number of interruptions, long utterances, times grabbed the floor, influence diffusion model, number of agreements and several acoustic features may be related to verbal intelligence. These features were used for the automatic classification of the dialogue partners into two groups (lower and higher verbal intelligence participants); the achieved accuracy was 89.36%.

pdf bib abs
Speech and Language Resources for LVCSR of Russian
Sergey Zablotskiy | Alexander Shvets | Maxim Sidorov | Eugene Semenkin | Wolfgang Minker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

A syllable-based language model reduces the lexicon size by hundreds of times. It is especially beneficial in case of highly inflective languages like Russian due to the abundance of word forms according to various grammatical categories. However, the main arising challenge is the concatenation of recognised syllables into the originally spoken sentence or phrase, particularly in the presence of syllable recognition mistakes. Natural fluent speech does not usually incorporate clear information about the outside borders of the spoken words. In this paper a method for the syllable concatenation and error correction is suggested and tested. It is based on the designed co-evolutionary asymptotic probabilistic genetic algorithm for the determination of the most likely sentence corresponding to the recognized chain of syllables within an acceptable time frame. The advantage of this genetic algorithm modification is the minimum number of settings to be manually adjusted comparing to the standard algorithm. Data used for acoustic and language modelling are also described here. A special issue is the preprocessing of the textual data, particularly, handling of abbreviations, Arabic and Roman numerals, since their inflection mostly depends on the context and grammar.

2011

pdf bib
Modeling and Predicting Quality in Spoken Human-Computer Interaction
Alexander Schmitt | Benjamin Schatz | Wolfgang Minker
Proceedings of the SIGDIAL 2011 Conference

2010

pdf bib abs
WITcHCRafT: A Workbench for Intelligent exploraTion of Human ComputeR conversaTions
Alexander Schmitt | Gregor Bertrand | Tobias Heinroth | Wolfgang Minker | Jackson Liscombe
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present Witchcraft, an open-source framework for the evaluation of prediction models for spoken dialogue systems based on interaction logs and audio recordings. The use of Witchcraft is two fold: first, it provides an adaptable user interface to easily manage and browse thousands of logged dialogues (e.g. calls). Second, with help of the underlying models and the connected machine learning framework RapidMiner the workbench is able to display at each dialogue turn the probability of the task being completed based on the dialogue history. It estimates the emotional state, gender and age of the user. While browsing through a logged conversation, the user can directly observe the prediction result of the models at each dialogue step. By that, Witchcraft allows for spotting problematic dialogue situations and demonstrates where the current system and the prediction models have design flaws. Witchcraft will be made publically available to the community and will be deployed as open-source project.

pdf bib abs
The Influence of the Utterance Length on the Recognition of Aged Voices
Alexander Schmitt | Tim Polzehl | Wolfgang Minker | Jackson Liscombe
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper addresses the recognition of elderly callers based on short and narrow-band utterances, which are typical for Interactive Voice Response (IVR) systems. Our study is based on 2308 short utterances from a deployed IVR application. We show that features such as speaking rate, jitter and shimmer that are considered as most meaningful ones for determining elderly users underperform when used in the IVR context while pitch and intensity features seem to gain importance. We further demonstrate the influence of the utterance length on the classifiers performance: for both humans and classifier, the distinction between aged and non-aged voices becomes increasingly difficult the shorter the utterances get. Our setup based on a Support Vector Machine (SVM) with linear kernel reaches a comparably poor performance of 58% accuracy, which can be attributed to an average utterance length of only 1.6 seconds. The automatic distinction between aged and non-aged utterances drops to random when the utterance length falls below 1.2 seconds.

pdf bib abs
Efficient Spoken Dialogue Domain Representation and Interpretation
Tobias Heinroth | Dan Denich | Alexander Schmitt | Wolfgang Minker
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We provide a detailed look on the functioning of the OwlSpeak Spoken Dialogue Manager, which is part of the EU-funded project ATRACO. OwlSpeak interprets Spoken Dialogue Ontologies and on this basis generates VoiceXML dialogue snippets. The dialogue snippets can be interpreted by all speech servers that provide VoiceXML support and therefore make the dialogue management independent from the hosting systems providing speech recognition and synthesis. Ontologies are used within the framework of our prototype to represent specific spoken dialogue domains that can dynamically be broadened or tightened during an ongoing dialogue. We provide an exemplary dialogue encoded as OWL model and explain how this model is interpreted by the dialogue manager. The combination of a unified model for dialogue domains and the strict model-view-controller architecture that underlies the dialogue manager lead to an efficient system that allows for a new way of spoken dialogue system development and can be used for further research on adaptive spoken dialogue strategies.

pdf bib abs
Towards Investigating Effective Affective Dialogue Strategies
Gregor Bertrand | Florian Nothdurft | Steffen Walter | Andreas Scheck | Henrik Kessler | Wolfgang Minker
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We describe an experimentalWizard-of-Oz-setup for the integration of emotional strategies into spoken dialogue management. With this setup we seek to evaluate different approaches to emotional dialogue strategies in human computer interaction with a spoken dialogue system. The study aims to analyse what kinds of emotional strategies work best in spoken dialogue management especially facing the problem that users may not be honest about their emotions. Therefore as well direct (user is asked about his state) as indirect (measurements of psychophysiological features) evidence is considered for the evaluation of our strategies.

pdf bib abs
Speech Data Corpus for Verbal Intelligence Estimation
Kseniya Zablotskaya | Steffen Walter | Wolfgang Minker
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The goal of our research is the development of algorithms for automatic estimation of a person's verbal intelligence based on the analysis of transcribed spoken utterances. In this paper we present the corpus of German native speakers' monologues and dialogues about the same topics collected at the University of Ulm, Germany. The monologues were descriptions of two short films; the dialogues were discussions about problems of German education. The data corpus contains the verbal intelligence quotients of each speaker, which were measured with the Hamburg Wechsler Intelligence Test for Adults. In this paper we describe our corpus, why we decided to create it, and how it was collected. We also describe some approaches which can be applied to the transcribed spoken utterances for extraction of different features which could have a correlation with a person's verbal intelligence. The data corpus consists of 71 monologues and 30 dialogues (about 10 hours of audio data).

pdf bib
Advances in the Witchcraft Workbench Project
Alexander Schmitt | Wolfgang Minker | Nada Sharaf
Proceedings of the SIGDIAL 2010 Conference

2008

The PIT corpus is a German multi-media corpus of multi-party dialogues recorded in a Wizard-of-Oz environment at the University of Ulm. The scenario involves two human dialogue partners interacting with a multi-modal dialogue system in the domain of restaurant selection. In this paper we present the characteristics of the data which was recorded in three sessions resulting in a total of 75 dialogues and about 14 hours of audio and video data. The corpus is available at http://www.uni-ulm.de/in/pit.

2006

pdf bib abs
Stochastic Spoken Natural Language Parsing in the Framework of the French MEDIA Evaluation Campaign
Dirk Bühler | Wolfgang Minker
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

A stochastic parsing component has been applied on a French spoken language dialogue corpus, recorded in the framework of the MEDIA evaluation campaign. Realized as an ergodic HMM using Viterbide coding, the parser outputs the most likely semantic representation given a transcribed utterance as input. The semantic sequences used for training and testing have been derived from the semantic representations of the MEDIA corpus. The HMM parameters have been estimated given the word sequences along with their semantic representation. The performance score of the stochastic parser has been automatically determined using the mediaval tool applied to a held out reference corpus. Evaluation results will be presented in the paper.

In this paper we present the setup of an extensive Wizard-of-Oz environment used for the data collection and the development of a dialogue system. The envisioned Perception and Interaction Assistant will act as an independent dialogue partner. Passively observing the dialogue between the two human users with respect to a limited domain, the system should take the initiative and get meaningfully involved in the communication process when required by the conversational situation. The data collection described here involves audio and video data. We aim at building a rich multi-media data corpus to be used as a basis for our research which includes, inter alia, speech and gaze direction recognition, dialogue modelling and proactivity of the system. We further aspire to obtain data with emotional content to perfom research on emotion recognition, psychopysiological and usability analysis.