Classifying Emotional Utterances by Employing Multi-modal Speech Emotion Recognition

Dipankar Das

Classifying Emotional Utterances by Employing Multi-modal Speech Emotion Recognition

[How to correct problems with metadata yourself]

Abstract

Deep learning methods are being applied to several speech processing problems in recent years. In the present work, we have explored different deep learning models for speech emotion recognition. We have employed normal deep feedforward neural network (FFNN) and convolutional neural network (CNN) to classify audio files according to their emotional content. Comparative study indicates that CNN model outperforms FFNN in case of emotions as well as gender classification. It was observed that the sole audio based models can capture the emotions up to a certain limit. Thus, we attempted a multi-modal framework by combining the benefits of the audio and text features and employed them into a recurrent encoder. Finally, the audio and text encoders are merged to provide the desired impact on various datasets. In addition, a database consists of emotional utterances of several words has also been developed as a part of this work. It contains same word in different emotional utterances. Though the size of the database is not that large but this database is ideally supposed to contain all the English words that exist in an English dictionary.

Anthology ID:: 2021.smp-1.1
Volume:: Proceedings of the Workshop on Speech and Music Processing 2021
Month:: December
Year:: 2021
Address:: NIT Silchar, India
Editors:: Anupam Biswas, Rabul Hussain Laskar, Pinki Roy
Venue:: SMP
SIG:
Publisher:: NLP Association of India (NLPAI)
Note:
Pages:: 1–13
Language:
URL:: https://aclanthology.org/2021.smp-1.1
DOI:
Bibkey:
Cite (ACL):: Dipankar Das. 2021. Classifying Emotional Utterances by Employing Multi-modal Speech Emotion Recognition. In Proceedings of the Workshop on Speech and Music Processing 2021, pages 1–13, NIT Silchar, India. NLP Association of India (NLPAI).
Cite (Informal):: Classifying Emotional Utterances by Employing Multi-modal Speech Emotion Recognition (Das, SMP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/teach-a-man-to-fish/2021.smp-1.1.pdf
Data: IEMOCAP

PDF Cite Search