2016
pdf
abs
Could Speaker, Gender or Age Awareness be beneficial in Speech-based Emotion Recognition?
Maxim Sidorov
|
Alexander Schmitt
|
Eugene Semenkin
|
Wolfgang Minker
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Emotion Recognition (ER) is an important part of dialogue analysis which can be used in order to improve the quality of Spoken Dialogue Systems (SDSs). The emotional hypothesis of the current response of an end-user might be utilised by the dialogue manager component in order to change the SDS strategy which could result in a quality enhancement. In this study additional speaker-related information is used to improve the performance of the speech-based ER process. The analysed information is the speaker identity, gender and age of a user. Two schemes are described here, namely, using additional information as an independent variable within the feature vector and creating separate emotional models for each speaker, gender or age-cluster independently. The performances of the proposed approaches were compared against the baseline ER system, where no additional information has been used, on a number of emotional speech corpora of German, English, Japanese and Russian. The study revealed that for some of the corpora the proposed approach significantly outperforms the baseline methods with a relative difference of up to 11.9%.
2014
pdf
abs
Speech-Based Emotion Recognition: Feature Selection by Self-Adaptive Multi-Criteria Genetic Algorithm
Maxim Sidorov
|
Christina Brester
|
Wolfgang Minker
|
Eugene Semenkin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Automated emotion recognition has a number of applications in Interactive Voice Response systems, call centers, etc. While employing existing feature sets and methods for automated emotion recognition has already achieved reasonable results, there is still a lot to do for improvement. Meanwhile, an optimal feature set, which should be used to represent speech signals for performing speech-based emotion recognition techniques, is still an open question. In our research, we tried to figure out the most essential features with self-adaptive multi-objective genetic algorithm as a feature selection technique and a probabilistic neural network as a classifier. The proposed approach was evaluated using a number of multi-languages databases (English, German), which were represented by 37- and 384-dimensional feature sets. According to the obtained results, the developed technique allows to increase the emotion recognition performance by up to 26.08 relative improvement in accuracy. Moreover, emotion recognition performance scores for all applied databases are improved.
pdf
Opinion Mining and Topic Categorization with Novel Term Weighting
Tatiana Gasanova
|
Roman Sergienko
|
Shakhnaz Akhmedova
|
Eugene Semenkin
|
Wolfgang Minker
Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
2013
pdf
A Semi-supervised Approach for Natural Language Call Routing
Tatiana Gasanova
|
Eugene Zhukov
|
Roman Sergienko
|
Eugene Semenkin
|
Wolfgang Minker
Proceedings of the SIGDIAL 2013 Conference
2012
pdf
abs
Speech and Language Resources for LVCSR of Russian
Sergey Zablotskiy
|
Alexander Shvets
|
Maxim Sidorov
|
Eugene Semenkin
|
Wolfgang Minker
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
A syllable-based language model reduces the lexicon size by hundreds of times. It is especially beneficial in case of highly inflective languages like Russian due to the abundance of word forms according to various grammatical categories. However, the main arising challenge is the concatenation of recognised syllables into the originally spoken sentence or phrase, particularly in the presence of syllable recognition mistakes. Natural fluent speech does not usually incorporate clear information about the outside borders of the spoken words. In this paper a method for the syllable concatenation and error correction is suggested and tested. It is based on the designed co-evolutionary asymptotic probabilistic genetic algorithm for the determination of the most likely sentence corresponding to the recognized chain of syllables within an acceptable time frame. The advantage of this genetic algorithm modification is the minimum number of settings to be manually adjusted comparing to the standard algorithm. Data used for acoustic and language modelling are also described here. A special issue is the preprocessing of the textual data, particularly, handling of abbreviations, Arabic and Roman numerals, since their inflection mostly depends on the context and grammar.