Jinghe Yu
2026
AICA-Bench: Holistically Examining the Capabilities of VLMs in Affective Image Content Analysis
Dong She | Xianrong Yao | Liqun Chen | Jinghe Yu | Yang Gao | Zhanpeng Jin
Findings of the Association for Computational Linguistics: ACL 2026
Dong She | Xianrong Yao | Liqun Chen | Jinghe Yu | Yang Gao | Zhanpeng Jin
Findings of the Association for Computational Linguistics: ACL 2026
Vision-Language Models (VLMs) have demonstrated strong capabilities in perception, yet holistic Affective Image Content Analysis (AICA)—which integrates perception, reasoning, and generation into a unified framework—remains underexplored. To address this, we introduce AICA-Bench, a comprehensive benchmark comprising three core tasks: Emotion Understanding (EU), Reasoning (ER), and Generation (EGCG). We evaluate 23 VLMs, revealing critical gaps: models struggle with intensity calibration and suffer from descriptive shallowness in open-ended tasks. To bridge these gaps, we propose Grounded Affective Tree (GAT) Prompting, a training-free framework that integrates visual scaffolding with hierarchical reasoning. Experiments show that GAT effectively corrects intensity errors and significantly enhances descriptive depth, establishing a robust baseline for future affective multimodal research.