SubmissionNumber#=%=#45 FinalPaperTitle#=%=#NCL Team at SemEval-2024 Task 3: Fusing Multimodal Pre-training Embeddings for Emotion Cause Prediction in Conversations ShortPaperTitle#=%=# NumberOfPages#=%=#6 CopyrightSigned#=%=#Shu Li JobTitle#==# Organization#==#organization Name: Beijing Accent Advertising Co., Ltd. Address: Beijing Shi Chaoyang Qu Qiangtai Xiang Dong Bajianfang Cun Beigangzi Dahuan Nei Jia 5 Hao Yuan organization2: Name: School of Computing, Newcastle University, Newcastle upon Tyne, UK Address: Newcastle Newcastle University Newcastle upon Tyne NE1 7RU Abstract#==#In this study, we introduce an MLP approach for extracting multimodal cause utterances in conversations, utilizing the multimodal conversational emotion causes from the ECF dataset. Our research focuses on evaluating a bi-modal framework that integrates video and audio embeddings to analyze emotional expressions within dialogues. The core of our methodology involves the extraction of embeddings from pre-trained models for each modality, followed by their concatenation and subsequent classification via an MLP network. We compared the accuracy performances across different modality combinations including text-audio-video, video-audio, and audio only. Author{1}{Firstname}#=%=#Shu Author{1}{Lastname}#=%=#Li Author{1}{Username}#=%=#bbgame Author{1}{Email}#=%=#15510388836@163.com Author{1}{Affiliation}#=%=#Beijing Accent Advertising Co., Ltd. Author{2}{Firstname}#=%=#Zicen Author{2}{Lastname}#=%=#Liao Author{2}{Email}#=%=#liaozicen55@gmail.com Author{2}{Affiliation}#=%=#School of Computing, Newcastle University, Newcastle upon Tyne, UK Author{3}{Firstname}#=%=#Huizhi Author{3}{Lastname}#=%=#Liang Author{3}{Email}#=%=#huizhi.liang@newcastle.ac.uk Author{3}{Affiliation}#=%=#School of Computing, Newcastle University, Newcastle upon Tyne, UK ========== èéáğö