Abstract
Previous models of turn-taking have mostly been trained for specific turn-taking decisions, such as discriminating between turn shifts and turn retention in pauses. In this paper, we present a predictive, continuous model of turn-taking using Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). The model is trained on human-human dialogue data to predict upcoming speech activity in a future time window. We show how this general model can be applied to two different tasks that it was not specifically trained for. First, to predict whether a turn-shift will occur or not in pauses, where the model achieves a better performance than human observers, and better than results achieved with more traditional models. Second, to make a prediction at speech onset whether the utterance will be a short backchannel or a longer utterance. Finally, we show how the hidden layer in the network can be used as a feature vector for turn-taking decisions in a human-robot interaction scenario.- Anthology ID:
- W17-5527
- Volume:
- Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
- Month:
- August
- Year:
- 2017
- Address:
- Saarbrücken, Germany
- Editors:
- Kristiina Jokinen, Manfred Stede, David DeVault, Annie Louis
- Venue:
- SIGDIAL
- SIG:
- SIGDIAL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 220–230
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/W17-5527/
- DOI:
- 10.18653/v1/W17-5527
- Cite (ACL):
- Gabriel Skantze. 2017. Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, pages 220–230, Saarbrücken, Germany. Association for Computational Linguistics.
- Cite (Informal):
- Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks (Skantze, SIGDIAL 2017)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/W17-5527.pdf