Shun Katada
2024
Multimodal Spoken Dialogue System with Biosignals
Shun Katada
Proceedings of the 20th Workshop of Young Researchers' Roundtable on Spoken Dialogue Systems
The dominance of large language models has forced the transformation of research directions in many domains. The growth speed of large-scale models and the knowledge acquired have reached incredible levels. Thus, researchers must have the ability and foresight to adapt to a rapidly changing environment. In this position paper, the author introduces research interests and discusses their relationships from the perspective of spoken dialogue systems. In particular, the fields of multimodal processing and affective computing are introduced. Additionally, the effects of large language models on spoken dialogue systems research and topics for discussion are presented.
Collecting Human-Agent Dialogue Dataset with Frontal Brain Signal toward Capturing Unexpressed Sentiment
Shun Katada
|
Ryu Takeda
|
Kazunori Komatani
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Multimodal information such as text and audiovisual data has been used for emotion/sentiment estimation during human-agent dialogue; however, user sentiments are not necessarily expressed explicitly during dialogues. Biosignals such as brain signals recorded using an electroencephalogram (EEG) sensor have been the subject of focus in affective computing regions to capture unexpressed emotional changes in a controlled experimental environment. In this study, we collect and analyze multimodal data with an EEG during a human-agent dialogue toward capturing unexpressed sentiment. Our contributions are as follows: (1) a new multimodal human-agent dialogue dataset is created, which includes not only text and audiovisual data but also frontal EEGs and physiological signals during the dialogue. In total, about 500-minute chat dialogues were collected from thirty participants aged 20 to 70. (2) We present a novel method for dealing with eye-blink noise for frontal EEGs denoising. This method applies facial landmark tracking to detect and delete eye-blink noise. (3) An experimental evaluation showed the effectiveness of the frontal EEGs. It improved sentiment estimation performance when used with other modalities by multimodal fusion, although it only has three channels.
Search