Analyzing Key Factors Influencing Emotion Prediction Performance of VLLMs in Conversational Contexts

Jaewook Lee; Yeajin Jang; Hongjin Kim; Woojin Lee; Harksoo Kim

doi:10.18653/v1/2024.emnlp-main.331

Analyzing Key Factors Influencing Emotion Prediction Performance of VLLMs in Conversational Contexts

Jaewook Lee, Yeajin Jang, Hongjin Kim, Woojin Lee, Harksoo Kim

Abstract

Emotional intelligence (EI) in artificial intelligence (AI), which refers to the ability of an AI to understand and respond appropriately to human emotions, has emerged as a crucial research topic. Recent studies have shown that large language models (LLMs) and vision large language models (VLLMs) possess EI and the ability to understand emotional stimuli in the form of text and images, respectively. However, factors influencing the emotion prediction performance of VLLMs in real-world conversational contexts have not been sufficiently explored. This study aims to analyze the key elements affecting the emotion prediction performance of VLLMs in conversational contexts systematically. To achieve this, we reconstructed the MELD dataset, which is based on the popular TV series Friends, and conducted experiments through three sub-tasks: overall emotion tone prediction, character emotion prediction, and contextually appropriate emotion expression selection. We evaluated the performance differences based on various model architectures (e.g., image encoders, modality alignment, and LLMs) and image scopes (e.g., entire scene, person, and facial expression). In addition, we investigated the impact of providing persona information on the emotion prediction performance of the models and analyzed how personality traits and speaking styles influenced the emotion prediction process. We conducted an in-depth analysis of the impact of various other factors, such as gender and regional biases, on the emotion prediction performance of VLLMs. The results revealed that these factors significantly influenced the model performance.

Anthology ID:: 2024.emnlp-main.331
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5801–5816
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.331/
DOI:: 10.18653/v1/2024.emnlp-main.331
Bibkey:
Cite (ACL):: Jaewook Lee, Yeajin Jang, Hongjin Kim, Woojin Lee, and Harksoo Kim. 2024. Analyzing Key Factors Influencing Emotion Prediction Performance of VLLMs in Conversational Contexts. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 5801–5816, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Analyzing Key Factors Influencing Emotion Prediction Performance of VLLMs in Conversational Contexts (Lee et al., EMNLP 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.emnlp-main.331.pdf

PDF Cite Search Fix data