MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition

Zhongyu Yang; Junhao Song; Siyang Song; Wei Pang; Yingfang Yuan

MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition

Zhongyu Yang, Junhao Song, Siyang Song, Wei Pang, Yingfang Yuan

Abstract

Multimodal large language models (MLLMs) have demonstrated strong performance across diverse multimodal tasks, achieving promising outcomes. However, their application to emotion recognition in natural images remains underexplored. MLLMs struggle to handle ambiguous emotional expressions and implicit affective cues, whose capability is crucial for affective understanding but largely overlooked. To address these challenges, we propose MERMAID, a novel multi-agent framework that integrates a multi-perspective self-reflection module, an emotion-guided visual augmentation module, and a cross-modal verification module. These components enable agents to interact across modalities and reinforce subtle emotional semantics, thereby enhancing emotion recognition and supporting autonomous performance. Extensive experiments show that MERMAID outperforms existing methods, achieving absolute accuracy gains of 8.70%–27.90% across diverse benchmarks and exhibiting greater robustness in emotionally diverse scenarios.

Anthology ID:: 2025.emnlp-main.1252
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 24650–24666
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1252/
DOI:
Bibkey:
Cite (ACL):: Zhongyu Yang, Junhao Song, Siyang Song, Wei Pang, and Yingfang Yuan. 2025. MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 24650–24666, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition (Yang et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1252.pdf
Checklist:: 2025.emnlp-main.1252.checklist.pdf

PDF Cite Search Checklist Fix data