Abstract
In this study, we introduce an MLP approach for extracting multimodal cause utterances in conversations, utilizing the multimodal conversational emotion causes from the ECF dataset. Our research focuses on evaluating a bi-modal framework that integrates video and audio embeddings to analyze emotional expressions within dialogues. The core of our methodology involves the extraction of embeddings from pre-trained models for each modality, followed by their concatenation and subsequent classification via an MLP network. We compared the accuracy performances across different modality combinations including text-audio-video, video-audio, and audio only.- Anthology ID:
- 2024.semeval-1.44
- Volume:
- Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
- Month:
- June
- Year:
- 2024
- Address:
- Mexico City, Mexico
- Editors:
- Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 285–290
- Language:
- URL:
- https://aclanthology.org/2024.semeval-1.44
- DOI:
- 10.18653/v1/2024.semeval-1.44
- Cite (ACL):
- Shu Li, Zicen Liao, and Huizhi Liang. 2024. NCL Team at SemEval-2024 Task 3: Fusing Multimodal Pre-training Embeddings for Emotion Cause Prediction in Conversations. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 285–290, Mexico City, Mexico. Association for Computational Linguistics.
- Cite (Informal):
- NCL Team at SemEval-2024 Task 3: Fusing Multimodal Pre-training Embeddings for Emotion Cause Prediction in Conversations (Li et al., SemEval 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.semeval-1.44.pdf