Multi-Modal Sequence Fusion via Recursive Attention for Emotion Recognition

Rory Beard; Ritwik Das; Raymond W. M. Ng; P. G. Keerthana Gopalakrishnan; Luka Eerens; Pawel Swietojanski; Ondrej Miksik

doi:10.18653/v1/K18-1025

Multi-Modal Sequence Fusion via Recursive Attention for Emotion Recognition

Rory Beard, Ritwik Das, Raymond W. M. Ng, P. G. Keerthana Gopalakrishnan, Luka Eerens, Pawel Swietojanski, Ondrej Miksik

Abstract

Natural human communication is nuanced and inherently multi-modal. Humans possess specialised sensoria for processing vocal, visual, and linguistic, and para-linguistic information, but form an intricately fused percept of the multi-modal data stream to provide a holistic representation. Analysis of emotional content in face-to-face communication is a cognitive task to which humans are particularly attuned, given its sociological importance, and poses a difficult challenge for machine emulation due to the subtlety and expressive variability of cross-modal cues. Inspired by the empirical success of recent so-called End-To-End Memory Networks and related works, we propose an approach based on recursive multi-attention with a shared external memory updated over multiple gated iterations of analysis. We evaluate our model across several large multi-modal datasets and show that global contextualised memory with gated memory update can effectively achieve emotion recognition.

Anthology ID:: K18-1025
Volume:: Proceedings of the 22nd Conference on Computational Natural Language Learning
Month:: October
Year:: 2018
Address:: Brussels, Belgium
Editors:: Anna Korhonen, Ivan Titov
Venue:: CoNLL
SIG:: SIGNLL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 251–259
Language:
URL:: https://aclanthology.org/K18-1025
DOI:: 10.18653/v1/K18-1025
Bibkey:
Cite (ACL):: Rory Beard, Ritwik Das, Raymond W. M. Ng, P. G. Keerthana Gopalakrishnan, Luka Eerens, Pawel Swietojanski, and Ondrej Miksik. 2018. Multi-Modal Sequence Fusion via Recursive Attention for Emotion Recognition. In Proceedings of the 22nd Conference on Computational Natural Language Learning, pages 251–259, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):: Multi-Modal Sequence Fusion via Recursive Attention for Emotion Recognition (Beard et al., CoNLL 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/K18-1025.pdf
Data: CMU-MOSEI

PDF Search