Automatically augmenting an emotion dataset improves classification using audio

Egor Lakomkin, Cornelius Weber, Stefan Wermter


Abstract
In this work, we tackle a problem of speech emotion classification. One of the issues in the area of affective computation is that the amount of annotated data is very limited. On the other hand, the number of ways that the same emotion can be expressed verbally is enormous due to variability between speakers. This is one of the factors that limits performance and generalization. We propose a simple method that extracts audio samples from movies using textual sentiment analysis. As a result, it is possible to automatically construct a larger dataset of audio samples with positive, negative emotional and neutral speech. We show that pretraining recurrent neural network on such a dataset yields better results on the challenging EmotiW corpus. This experiment shows a potential benefit of combining textual sentiment analysis with vocal information.
Anthology ID:
E17-2031
Volume:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Mirella Lapata, Phil Blunsom, Alexander Koller
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
194–197
Language:
URL:
https://aclanthology.org/E17-2031
DOI:
Bibkey:
Cite (ACL):
Egor Lakomkin, Cornelius Weber, and Stefan Wermter. 2017. Automatically augmenting an emotion dataset improves classification using audio. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 194–197, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
Automatically augmenting an emotion dataset improves classification using audio (Lakomkin et al., EACL 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ml4al-ingestion/E17-2031.pdf