Xulang Zhang

2026

Evaluating Multimodal Large Language Model Narrative Interpretation through the Lens of Appraisal Theory
Jayant Teotia | Xiaowei Wang | Xulang Zhang | Rui Mao | Erik Cambria
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Narrative interpretation is an essential aspect of human cognition, enabling individuals to comprehend complex sequences of events, form emotional connections, and engage in nuanced social reasoning. At the heart of this interpretive ability lies emotional understanding, which cognitive scientists often frame through Appraisal Theory, a model that views emotions as the outcome of subjective evaluations of events in relation to goals, values, and beliefs. In this study, we explore whether multimodal large language models (MLLMs) are able to replicate aspects of this human-like narrative and emotional reasoning. Specifically, we examine how well MLLMs interpret visual narratives, with a focus on their ability to identify and appraise emotional content within scenes. We also investigate whether these models can utilize additional narrative descriptions generated by them to enhance their emotional recognition capabilities, as humans often do. To probe these questions, we conducted a series of experiments using two publicly available datasets, EMOTIC and HECO. Contrary to our expectations, our results reveal a consistent and noteworthy pattern: rather than improving the models’ performance, the inclusion of supplementary narrative or contextual information frequently diminishes their ability to accurately recognize emotions. This counterintuitive finding suggests that current MLLMs face significant challenges in integrating multimodal information in a coherent, context-sensitive way. These findings underscore key limitations in the emotional and narrative reasoning capabilities of existing MLLMs and highlight a critical gap between human cognitive processes and current AI approaches.

pdf bib abs

Implicit Bias in Peer Review: Through the Lens of Language Abstraction
Xulang Zhang | Rui Mao | Erik Cambria
Proceedings of the Fifteenth Language Resources and Evaluation Conference

Peer review is essential for the scholarly publishing process. However, its credibility is increasingly brought to questions. Bias is one of the aspects worthy of investigation. Existing research mostly focuses on predefined, explicit bias types, which are insufficient for analyzing the myriad of implicit biases in peer review. Thus, we proposed to study the bias in peer review through the lens of language abstraction, informed by the cognitive theories which suggest that frequency of abstraction in descriptions plays a latent yet important role in bias transmission. Hence, we trained a model to assess the abstraction level of text, and applied it to a review dataset to examine the connection between abstraction and the implicit biases in peer reviews. Results show that there are indeed observable quantitative differences in the abstraction use of reviews recommending to reject versus recommending to accept. Furthermore, reviews for the rejected papers tend to be more abstract than ones for the accepted papers, indicating possible transmission of implicit bias. To the best of our knowledge, our study is the first to study generalized Linguistic Intergroup Bias in the academic text domain.

pdf bib abs

The Evolution of Philosophy: A Metaphorical Cognition Perspective
Rui Mao | Dapeng Chen | Zihao Huang | Xulang Zhang | Erik Cambria
Proceedings of the Fifteenth Language Resources and Evaluation Conference

We present a large-scale study of philosophical cognition through the lens of Conceptual Metaphor Theory. Using a computational metaphor processing system that extracts target concepts, source concepts, and concept mappings from a curated corpus of 50+ canonical texts (300k sentences) spanning ten schools from antiquity to the late twentieth century, we quantify how metaphor organizes philosophical argument. We model temporal dynamics with year-level cosine series, authorial neighborhoods with PCA projections, and school signatures with heatmaps of normalized frequencies. The study demonstrates that the history of philosophy is structured by stable cross-domain schemas that are selectively recombined to address new problems.

2024

pdf bib abs

SenticVec: Toward Robust and Human-Centric Neurosymbolic Sentiment Analysis
Xulang Zhang | Rui Mao | Erik Cambria
Findings of the Association for Computational Linguistics: ACL 2024

The success of state-of-the-art Natural Language Processing (NLP) systems heavily depends on deep neural networks, which excel in various tasks through strong data fitting and latent feature modeling abilities. However, certain challenges linked to deep neural networks and supervised deep learning deserve considerations, e.g., extensive computing resources, knowledge forgetting, etc. Previous research attempted to tackle these challenges individually through irrelative techniques. However, they do not instigate fundamental shifts in the learning paradigm. In this work, we propose a novel neurosymbolic method for sentiment analysis to tackle these issues. We also propose a novel sentiment-pragmatic knowledge base that places emphasis on human subjectivity within varying domain annotations. We conducted extensive experiments to show that our neurosymbolic framework for sentiment analysis stands out for its lightweight nature, robustness across domains and languages, efficient few-shot training, and rapid convergence.

pdf bib abs

Vanessa: Visual Connotation and Aesthetic Attributes Understanding Network for Multimodal Aspect-based Sentiment Analysis
Luwei Xiao | Rui Mao | Xulang Zhang | Liang He | Erik Cambria
Findings of the Association for Computational Linguistics: EMNLP 2024

Prevailing research concentrates on superficial features or descriptions of images, revealing a significant gap in the systematic exploration of their connotative and aesthetic attributes. Furthermore, the use of cross-modal relation detection modules to eliminate noise from comprehensive image representations leads to the omission of subtle contextual information. In this paper, we present a Visual Connotation and Aesthetic Attributes Understanding Network (Vanessa) for Multimodal Aspect-based Sentiment Analysis. Concretely, Vanessa incorporates a Multi-Aesthetic Attributes Aggregation (MA3) module that models intra- and inter-dependencies among bi-modal representations as well as emotion-laden aesthetic attributes. Moreover, we devise a self-supervised contrastive learning framework to explore the pairwise relevance between images and text via the Gaussian distribution of their CLIP scores. By dynamically clustering and merging multi-modal tokens, Vanessa effectively captures both implicit and explicit sentimental cues. Extensive experiments on widely adopted two benchmarks verify Vanessa’s effectiveness.

pdf bib abs

GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao | Guanyi Chen | Xulang Zhang | Frank Guerin | Erik Cambria
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The emergence of ChatGPT has generated much speculation in the press about its potential to disrupt social and economic systems. Its astonishing language ability has aroused strong curiosity among scholars about its performance in different domains. There have been many studies evaluating the ability of ChatGPT and GPT-4 in different tasks and disciplines. However, a comprehensive review summarizing the collective assessment findings is lacking. The objective of this survey is to thoroughly analyze prior assessments of ChatGPT and GPT-4, focusing on its language and reasoning abilities, scientific knowledge, and ethical considerations. Furthermore, an examination of the existing evaluation methods is conducted, offering several recommendations for future research.

2023

pdf bib abs

Neuro-Symbolic Sentiment Analysis with Dynamic Word Sense Disambiguation
Xulang Zhang | Rui Mao | Kai He | Erik Cambria
Findings of the Association for Computational Linguistics: EMNLP 2023

Sentiment analysis is a task that highly depends on the understanding of word senses. Traditional neural network models are black boxes that represent word senses as vectors that are uninterpretable for humans. On the other hand, the application of Word Sense Disambiguation (WSD) systems in downstream tasks poses challenges regarding i) which words need to be disambiguated, and ii) how to model explicit word senses into easily understandable terms for a downstream model. This work proposes a neurosymbolic framework that incorporates WSD by identifying and paraphrasing ambiguous words to improve the accuracy of sentiment predictions. The framework allows us to understand which words are paraphrased into which semantically unequivocal words, thus enabling a downstream task model to gain both accuracy and interpretability. To better fine-tune a lexical substitution model for WSD on a downstream task without ground-truth word sense labels, we leverage dynamic rewarding to jointly train sentiment analysis and lexical substitution models. Our framework proves to effectively improve the performance of sentiment analysis on corpora from different domains.

Co-authors

Kai He 1

Venues

Fix author