Shun Chen
2026
CAIR: Causal Adaptive Information-based Reinforcement Learning for Multimodal Emotion Reasoning
Fengyu Zhang | Bin Liu | Jianhua Tao | Zhuofan Wen | Shun Chen | Hailiang Yao | Zhengqi Wen
Findings of the Association for Computational Linguistics: ACL 2026
Fengyu Zhang | Bin Liu | Jianhua Tao | Zhuofan Wen | Shun Chen | Hailiang Yao | Zhengqi Wen
Findings of the Association for Computational Linguistics: ACL 2026
Multimodal emotion reasoning requires both accurate identification and logical rationales to explain emotional triggers. However, current methods often suffer from causal degeneracy, where models produce linguistically fluent but superficial explanations that lack authentic logical derivation. To resolve this, we propose CAIR (Causal Adaptive Information-based Reinforcement Learning), a reinforcement learning framework that treats rationales as causal mediators between raw perceptual signals and emotional semantics. Our core contribution is the Causal Mediation Reward (CMR), which quantifies a rationale’s interventional utility by measuring its marginal contribution to resolving predictive uncertainty. Additionally, we introduce an adaptive optimization mechanism based on the information bottleneck to balance perception and reasoning across varying cognitive loads. CAIR achieves state-of-the-art performance on MTMEUR with 73.80% accuracy and competitive results on the SCEA subset of EmoBench-M (68.5%), outperforming specialized SFT baselines by up to 14.4% while enhancing rationale faithfulness. Our findings underscore that principled reward design, rather than mere model scaling, is essential for building systems with authentic, human-like emotional understanding.
2025
Listen, Watch, and Learn to Feel: Retrieval-Augmented Emotion Reasoning for Compound Emotion Generation
Zhuofan Wen | Zheng Lian | Shun Chen | Hailiang Yao | Longjiang Yang | Bin Liu | Jianhua Tao
Findings of the Association for Computational Linguistics: ACL 2025
Zhuofan Wen | Zheng Lian | Shun Chen | Hailiang Yao | Longjiang Yang | Bin Liu | Jianhua Tao
Findings of the Association for Computational Linguistics: ACL 2025
The ability to comprehend human emotion using multimodal large language models (MLLMs) is essential for advancing human-AI interaction and multimodal sentiment analysis. While psychology theory-based human annotations have contributed to multimodal emotion tasks, the subjective nature of emotional perception often leads to inconsistent annotations, limiting the robustness of current models. Addressing these challenges requires more fine-grained methods and evaluation frameworks. In this paper, we propose the Retrieval-Augmented Emotion Reasoning (RAER) framework, a plug-and-play module that enhances MLLMs’ ability to tackle compound and context-rich emotion tasks. To systematically evaluate model performance, we introduce the Stimulus-Armed Bandit (SAB) framework, designed to benchmark emotional reasoning capabilities. Additionally, we construct the Compound Emotion QA dataset, an AI-generated multimodal dataset aimed at strengthening emotion understanding in MLLMs. Experimental results demonstrate the effectiveness of RAER across both traditional benchmarks and SAB evaluations, highlighting its potential to enhance emotional intelligence in multimodal AI systems.