GRPO-Guided Modality Selection Enhanced LoRA-Tuned LLMs for Multimodal Emotion Recognition

Yang Chen, Shuwan Yang, Yan Xiang, Ran Song, Yuxin Huang, Zhengtao Yu


Abstract
Multimodal emotion recognition in conversation (MERC) aims to identify speakers’ emotional states by utilizing text, audio, and visual modalities. Although recent large language model (LLM)-based methods have demonstrated strong performance, they typically adopt static fusion strategies that integrate all available modalities uniformly. This overlooks the fact that the necessity of multimodal cues can vary significantly across utterances. In this work, we propose an adaptive modality selection framework for MERC. The core of our approach is a modality selection module based on Group Relative Policy Optimization (GRPO), which enables a LoRA-tuned LLM to reason about the necessity of multimodal input via chain-of-thought (CoT) generation. This process does not require manually labeled modality selection data and is trained in a fully unsupervised manner. The selected modality configuration is then provided as input to a downstream emotion classifier, which is also implemented using a LoRA-tuned LLM and trained to predict emotional states. Experimental results on benchmark multimodal dialogue datasets show that our method consistently outperforms strong baselines, demonstrating the effectiveness of adaptive modality selection in improving recognition accuracy. Our code is available at https://github.com/youflyaway/Modality-Selection-Enhanced-LoRA-Tuned-LLMs.
Anthology ID:
2025.findings-emnlp.1059
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19458–19471
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1059/
DOI:
10.18653/v1/2025.findings-emnlp.1059
Bibkey:
Cite (ACL):
Yang Chen, Shuwan Yang, Yan Xiang, Ran Song, Yuxin Huang, and Zhengtao Yu. 2025. GRPO-Guided Modality Selection Enhanced LoRA-Tuned LLMs for Multimodal Emotion Recognition. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19458–19471, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
GRPO-Guided Modality Selection Enhanced LoRA-Tuned LLMs for Multimodal Emotion Recognition (Chen et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1059.pdf
Checklist:
 2025.findings-emnlp.1059.checklist.pdf