Yoshiko Arimoto


2016

pdf
Comparison of Emotional Understanding in Modality-Controlled Environments using Multimodal Online Emotional Communication Corpus
Yoshiko Arimoto | Kazuo Okanoya
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In online computer-mediated communication, speakers were considered to have experienced difficulties in catching their partner’s emotions and in conveying their own emotions. To explain why online emotional communication is so difficult and to investigate how this problem should be solved, multimodal online emotional communication corpus was constructed by recording approximately 100 speakers’ emotional expressions and reactions in a modality-controlled environment. Speakers communicated over the Internet using video chat, voice chat or text chat; their face-to-face conversations were used for comparison purposes. The corpora incorporated emotional labels by evaluating the speaker’s dynamic emotional states and the measurements of the speaker’s facial expression, vocal expression and autonomic nervous system activity. For the initial study of this project, which used a large-scale emotional communication corpus, the accuracy of online emotional understanding was assessed to demonstrate the emotional labels evaluated by the speakers and to summarize the speaker’s answers on the questionnaire regarding the difference between an online chat and face-to-face conversations in which they actually participated. The results revealed that speakers have difficulty communicating their emotions in online communication environments, regardless of the type of communication modality and that inaccurate emotional understanding occurs more frequently in online computer-mediated communication than in face-to-face communication.

pdf
Accuracy of Automatic Cross-Corpus Emotion Labeling for Conversational Speech Corpus Commonization
Hiroki Mori | Atsushi Nagaoka | Yoshiko Arimoto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

There exists a major incompatibility in emotion labeling framework among emotional speech corpora, that is, category-based and dimension-based. Commonizing these requires inter-corpus emotion labeling according to both frameworks, but doing this by human annotators is too costly for most cases. This paper examines the possibility of automatic cross-corpus emotion labeling. In order to evaluate the effectiveness of the automatic labeling, a comprehensive emotion annotation for two conversational corpora, UUDB and OGVC, was performed. With a state-of-the-art machine learning technique, dimensional and categorical emotion estimation models were trained and tested against the two corpora. For the emotion dimension estimation, the automatic cross-corpus emotion labeling for the different corpus was effective for the dimensions of aroused-sleepy, dominant-submissive and interested-indifferent, showing only slight performance degradation against the result for the same corpus. On the other hand, the performance for the emotion category estimation was not sufficient.

2008

pdf
Automatic Emotional Degree Labeling for Speakers’ Anger Utterance during Natural Japanese Dialog
Yoshiko Arimoto | Sumio Ohno | Hitoshi Iida
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper describes a method of automatic emotional degree labeling for speaker’s anger utterances during natural Japanese dialog. First, we explain how to record anger utterance naturally appeared in natural Japanese dialog. Manual emotional degree labeling was conducted in advance to grade the utterances by a 6 Likert scale to obtain a correct anger degree. Then experiments of automatic anger degree estimation were conducted to label an anger degree with each utterance by its acoustic features. Also estimation experiments were conducted with speaker-dependent datasets to find out any influence of individual emotional expression on automatic emotional degree labeling. As a result, almost all the speaker’s models show higher adjusted R square so that those models are superior to the speaker-independent model in those estimation capabilities. However, a residual between automatic emotional degree and manual emotional degree (0.73) is equivalent to those of speaker’s models. There still has a possibility to label utterances with the speaker-independent model.

2007

pdf
Predicting Evidence of Understanding by Monitoring User’s Task Manipulation in Multimodal Conversations
Yukiko Nakano | Kazuyoshi Murata | Mika Enomoto | Yoshiko Arimoto | Yasuhiro Asa | Hirohiko Sagawa
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions