JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models

Arefa ., Mohammed Abbas Ansari, Chandni Saxena, Tanvir Ahmad


Abstract
This paper presents our system development for SemEval-2024 Task 3: “The Competition of Multimodal Emotion Cause Analysis in Conversations”. Effectively capturing emotions in human conversations requires integrating multiple modalities such as text, audio, and video. However, the complexities of these diverse modalities pose challenges for developing an efficient multimodal emotion cause analysis (ECA) system. Our proposed approach addresses these challenges by a two-step framework. We adopt two different approaches in our implementation. In Approach 1, we employ instruction-tuning with two separate Llama 2 models for emotion and cause prediction. In Approach 2, we use GPT-4V for conversation-level video description and employ in-context learning with annotated conversation using GPT 3.5. Our system wins rank 4, and system ablation experiments demonstrate that our proposed solutions achieve significant performance gains.
Anthology ID:
2024.semeval-1.223
Volume:
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Atul Kr. Ojha, A. Seza Doğruöz, Harish Tayyar Madabushi, Giovanni Da San Martino, Sara Rosenthal, Aiala Rosá
Venue:
SemEval
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
1561–1576
Language:
URL:
https://aclanthology.org/2024.semeval-1.223
DOI:
Bibkey:
Cite (ACL):
Arefa ., Mohammed Abbas Ansari, Chandni Saxena, and Tanvir Ahmad. 2024. JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models. In Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), pages 1561–1576, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama models (. et al., SemEval 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-checklist/2024.semeval-1.223.pdf
Supplementary material:
 2024.semeval-1.223.SupplementaryMaterial.txt
Supplementary material:
 2024.semeval-1.223.SupplementaryMaterial.zip
Supplementary material:
 2024.semeval-1.223.SupplementaryMaterial.zip