Joakim Gustafson


2020

pdf bib
Chinese Whispers: A Multimodal Dataset for Embodied Language Grounding
Dimosthenis Kontogiorgos | Elena Sibirtseva | Joakim Gustafson
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we introduce a multimodal dataset in which subjects are instructing each other how to assemble IKEA furniture. Using the concept of ‘Chinese Whispers’, an old children’s game, we employ a novel method to avoid implicit experimenter biases. We let subjects instruct each other on the nature of the task: the process of the furniture assembly. Uncertainty, hesitations, repairs and self-corrections are naturally introduced in the incremental process of establishing common ground. The corpus consists of 34 interactions, where each subject first assembles and then instructs. We collected speech, eye-gaze, pointing gestures, and object movements, as well as subjective interpretations of mutual understanding, collaboration and task recall. The corpus is of particular interest to researchers who are interested in multimodal signals in situated dialogue, especially in referential communication and the process of language grounding.

pdf bib
Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis
Eva Szekely | Jens Edlund | Joakim Gustafson
Proceedings of the 12th Language Resources and Evaluation Conference

By definition, spontaneous speech is unscripted and created on the fly by the speaker. It is dramatically different from read speech, where the words are authored as text before they are spoken. Spontaneous speech is emergent and transient, whereas text read out loud is pre-planned. For this reason, it is unsuitable to evaluate the usability and appropriateness of spontaneous speech synthesis by having it read out written texts sampled from for example newspapers or books. Instead, we need to use transcriptions of speech as the target - something that is much less readily available. In this paper, we introduce Starmap, a tool allowing developers to select a varied, representative set of utterances from a spoken genre, to be used for evaluation of TTS for a given domain. The selection can be done from any speech recording, without the need for transcription. The tool uses interactive visualisation of prosodic features with t-SNE, along with a tree-based algorithm to guide the user through thousands of utterances and ensure coverage of a variety of prompts. A listening test has shown that with a selection of genre-specific utterances, it is possible to show significant differences across genres between two synthetic voices built from spontaneous speech.

2018

pdf bib
A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
Dimosthenis Kontogiorgos | Vanya Avramova | Simon Alexanderson | Patrik Jonell | Catharine Oertel | Jonas Beskow | Gabriel Skantze | Joakim Gustafson
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Crowdsourced Multimodal Corpora Collection Tool
Patrik Jonell | Catharine Oertel | Dimosthenis Kontogiorgos | Jonas Beskow | Joakim Gustafson
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Hidden Resources ― Strategies to Acquire and Exploit Potential Spoken Language Resources in National Archives
Jens Edlund | Joakim Gustafson
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In 2014, the Swedish government tasked a Swedish agency, The Swedish Post and Telecom Authority (PTS), with investigating how to best create and populate an infrastructure for spoken language resources (Ref N2014/2840/ITP). As a part of this work, the department of Speech, Music and Hearing at KTH Royal Institute of Technology have taken inventory of existing potential spoken language resources, mainly in Swedish national archives and other governmental or public institutions. In this position paper, key priorities, perspectives, and strategies that may be of general, rather than Swedish, interest are presented. We discuss broad types of potential spoken language resources available; to what extent these resources are free to use; and thirdly the main contribution: strategies to ensure the continuous acquisition of spoken language resources in a manner that facilitates speech and speech technology research.

2015

pdf bib
Automatic Detection of Miscommunication in Spoken Dialogue Systems
Raveesh Meena | José Lopes | Gabriel Skantze | Joakim Gustafson
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf bib
Proceedings of the EACL 2014 Workshop on Dialogue in Motion
Tiphaine Dalmas | Jana Götze | Joakim Gustafson | Srinivasan Janarthanam | Jan Kleindienst | Christian Mueller | Amanda Stent | Andreas Vlachos
Proceedings of the EACL 2014 Workshop on Dialogue in Motion

pdf bib
Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems
Jens Edlund | Fredrik Edelstam | Joakim Gustafson
Proceedings of the EACL 2014 Workshop on Dialogue in Motion

pdf bib
Crowdsourcing Street-level Geographic Information Using a Spoken Dialogue System
Raveesh Meena | Johan Boye | Gabriel Skantze | Joakim Gustafson
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

2013

pdf bib
Human Evaluation of Conceptual Route Graphs for Interpreting Spoken Route Descriptions
Raveesh Meena | Gabriel Skantze | Joakim Gustafson
Proceedings of the IWCS 2013 Workshop on Computational Models of Spatial Language Interpretation and Generation (CoSLI-3)

pdf bib
The Map Task Dialogue System: A Test-bed for Modelling Human-Like Dialogue
Raveesh Meena | Gabriel Skantze | Joakim Gustafson
Proceedings of the SIGDIAL 2013 Conference

pdf bib
A Data-driven Model for Timing Feedback in a Map Task Dialogue System
Raveesh Meena | Gabriel Skantze | Joakim Gustafson
Proceedings of the SIGDIAL 2013 Conference

2009

pdf bib
Eliciting Interactional Phenomena in Human-Human Dialogues
Joakim Gustafson | Miray Merkes
Proceedings of the SIGDIAL 2009 Conference

pdf bib
Attention and Interaction Control in a Human-Human-Computer Dialogue Setting
Gabriel Skantze | Joakim Gustafson
Proceedings of the SIGDIAL 2009 Conference

2005

pdf bib
How to do Dialogue in a Fairy-tale World
Johan Boye | Joakim Gustafson
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue

2004

pdf bib
The NICE Fairy-tale Game System
Joakim Gustafson | Linda Bell | Johan Boye | Anders Lindström | Mats Wirén
Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004