Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks

Gabriel Skantze

doi:10.18653/v1/W17-5527

Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks

Abstract

Previous models of turn-taking have mostly been trained for specific turn-taking decisions, such as discriminating between turn shifts and turn retention in pauses. In this paper, we present a predictive, continuous model of turn-taking using Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). The model is trained on human-human dialogue data to predict upcoming speech activity in a future time window. We show how this general model can be applied to two different tasks that it was not specifically trained for. First, to predict whether a turn-shift will occur or not in pauses, where the model achieves a better performance than human observers, and better than results achieved with more traditional models. Second, to make a prediction at speech onset whether the utterance will be a short backchannel or a longer utterance. Finally, we show how the hidden layer in the network can be used as a feature vector for turn-taking decisions in a human-robot interaction scenario.

Anthology ID:: W17-5527
Volume:: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Month:: August
Year:: 2017
Address:: Saarbrücken, Germany
Editors:: Kristiina Jokinen, Manfred Stede, David DeVault, Annie Louis
Venue:: SIGDIAL
SIG:: SIGDIAL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 220–230
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/W17-5527/
DOI:: 10.18653/v1/W17-5527
Bibkey:
Cite (ACL):: Gabriel Skantze. 2017. Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 220–230, Saarbrücken, Germany. Association for Computational Linguistics.
Cite (Informal):: Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks (Skantze, SIGDIAL 2017)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/W17-5527.pdf

PDF Cite Search Fix data