Tobias Heinroth


2012

pdf
Adaptive Speech Understanding for Intuitive Model-based Spoken Dialogues
Tobias Heinroth | Maximilian Grotz | Florian Nothdurft | Wolfgang Minker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we present three approaches towards adaptive speech understanding. The target system is a model-based Adaptive Spoken Dialogue Manager, the OwlSpeak ASDM. We enhanced this system in order to properly react on non-understandings in real-life situations where intuitive communication is required. OwlSpeak provides a model-based spoken interface to an Intelligent Environment depending on and adapting to the current context. It utilises a set of ontologies used as dialogue models that can be combined dynamically during runtime. Besides the benefits the system showed in practice, real-life evaluations also conveyed some limitations of the model-based approach. Since it is unfeasible to model all variations of the communication between the user and the system beforehand, various situations where the system did not correctly understand the user input have been observed. Thus we present three enhancements towards a more sophisticated use of the ontology-based dialogue models and show how grammars may dynamically be adapted in order to understand intuitive user utterances. The evaluation of our approaches revealed the incorporation of a lexical-semantic knowledgebase into the recognition process to be the most promising approach.

2010

pdf
WITcHCRafT: A Workbench for Intelligent exploraTion of Human ComputeR conversaTions
Alexander Schmitt | Gregor Bertrand | Tobias Heinroth | Wolfgang Minker | Jackson Liscombe
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present Witchcraft, an open-source framework for the evaluation of prediction models for spoken dialogue systems based on interaction logs and audio recordings. The use of Witchcraft is two fold: first, it provides an adaptable user interface to easily manage and browse thousands of logged dialogues (e.g. calls). Second, with help of the underlying models and the connected machine learning framework RapidMiner the workbench is able to display at each dialogue turn the probability of the task being completed based on the dialogue history. It estimates the emotional state, gender and age of the user. While browsing through a logged conversation, the user can directly observe the prediction result of the models at each dialogue step. By that, Witchcraft allows for spotting problematic dialogue situations and demonstrates where the current system and the prediction models have design flaws. Witchcraft will be made publically available to the community and will be deployed as open-source project.

pdf
Efficient Spoken Dialogue Domain Representation and Interpretation
Tobias Heinroth | Dan Denich | Alexander Schmitt | Wolfgang Minker
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We provide a detailed look on the functioning of the OwlSpeak Spoken Dialogue Manager, which is part of the EU-funded project ATRACO. OwlSpeak interprets Spoken Dialogue Ontologies and on this basis generates VoiceXML dialogue snippets. The dialogue snippets can be interpreted by all speech servers that provide VoiceXML support and therefore make the dialogue management independent from the hosting systems providing speech recognition and synthesis. Ontologies are used within the framework of our prototype to represent specific spoken dialogue domains that can dynamically be broadened or tightened during an ongoing dialogue. We provide an exemplary dialogue encoded as OWL model and explain how this model is interpreted by the dialogue manager. The combination of a unified model for dialogue domains and the strict model-view-controller architecture that underlies the dialogue manager lead to an efficient system that allows for a new way of spoken dialogue system development and can be used for further research on adaptive spoken dialogue strategies.

2009

pdf
On NoMatchs, NoInputs and BargeIns: Do Non-Acoustic Features Support Anger Detection?
Alexander Schmitt | Tobias Heinroth | Jackson Liscombe
Proceedings of the SIGDIAL 2009 Conference