Evaluating Multimodal Large Language Model Narrative Interpretation through the Lens of Appraisal Theory

Jayant Teotia; Xiaowei Wang; Xulang Zhang; Rui Mao; Erik Cambria

Evaluating Multimodal Large Language Model Narrative Interpretation through the Lens of Appraisal Theory

Jayant Teotia, Xiaowei Wang, Xulang Zhang, Rui Mao, Erik Cambria

Abstract

Narrative interpretation is an essential aspect of human cognition, enabling individuals to comprehend complex sequences of events, form emotional connections, and engage in nuanced social reasoning. At the heart of this interpretive ability lies emotional understanding, which cognitive scientists often frame through Appraisal Theory, a model that views emotions as the outcome of subjective evaluations of events in relation to goals, values, and beliefs. In this study, we explore whether multimodal large language models (MLLMs) are able to replicate aspects of this human-like narrative and emotional reasoning. Specifically, we examine how well MLLMs interpret visual narratives, with a focus on their ability to identify and appraise emotional content within scenes. We also investigate whether these models can utilize additional narrative descriptions generated by them to enhance their emotional recognition capabilities, as humans often do. To probe these questions, we conducted a series of experiments using two publicly available datasets, EMOTIC and HECO. Contrary to our expectations, our results reveal a consistent and noteworthy pattern: rather than improving the models’ performance, the inclusion of supplementary narrative or contextual information frequently diminishes their ability to accurately recognize emotions. This counterintuitive finding suggests that current MLLMs face significant challenges in integrating multimodal information in a coherent, context-sensitive way. These findings underscore key limitations in the emotional and narrative reasoning capabilities of existing MLLMs and highlight a critical gap between human cognitive processes and current AI approaches.

Anthology ID:: 2026.lrec-main.893
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 11417–11426
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.893/
DOI:
Bibkey:
Cite (ACL):: Jayant Teotia, Xiaowei Wang, Xulang Zhang, Rui Mao, and Erik Cambria. 2026. Evaluating Multimodal Large Language Model Narrative Interpretation through the Lens of Appraisal Theory. International Conference on Language Resources and Evaluation, main:11417–11426.
Cite (Informal):: Evaluating Multimodal Large Language Model Narrative Interpretation through the Lens of Appraisal Theory (Teotia et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.893.pdf

PDF Cite Search Fix data