Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data

Woo Yong Choi, Kyu Ye Song, Chan Woo Lee


Abstract
Emotion recognition has become a popular topic of interest, especially in the field of human computer interaction. Previous works involve unimodal analysis of emotion, while recent efforts focus on multimodal emotion recognition from vision and speech. In this paper, we propose a new method of learning about the hidden representations between just speech and text data using convolutional attention networks. Compared to the shallow model which employs simple concatenation of feature vectors, the proposed attention model performs much better in classifying emotion from speech and text data contained in the CMU-MOSEI dataset.
Anthology ID:
W18-3304
Volume:
Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML)
Month:
July
Year:
2018
Address:
Melbourne, Australia
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
28–34
Language:
URL:
https://aclanthology.org/W18-3304
DOI:
10.18653/v1/W18-3304
Bibkey:
Cite (ACL):
Woo Yong Choi, Kyu Ye Song, and Chan Woo Lee. 2018. Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data. In Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), pages 28–34, Melbourne, Australia. Association for Computational Linguistics.
Cite (Informal):
Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data (Choi et al., ACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/W18-3304.pdf