Yayun He


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
EMO-RL: Emotion-Rule-Based Reinforcement Learning Enhanced Audio-Language Model for Generalized Speech Emotion Recognition
Pengcheng Li | Botao Zhao | Zuheng Kang | Junqing Peng | Xiaoyang Qu | Yayun He | Jianzong Wang
Findings of the Association for Computational Linguistics: EMNLP 2025

Although large audio-language models (LALMs) have demonstrated remarkable capabilities in audio perception, their performance in affective computing scenarios, particularly in emotion recognition, reasoning, and subtle sentiment differentiation, remains suboptimal. Recent advances in reinforcement learning (RL) have shown promise in improving LALMs’ reasoning abilities. However, two critical challenges hinder the direct application of RL techniques to speech emotion recognition (SER) tasks: (1) convergence instability caused by ambiguous emotional boundaries and (2) limited reasoning ability when using relatively small models (e.g., 7B-parameter architectures). To address these challenges, we propose EMO-RL, a novel framework incorporating reinforcement learning with two key innovations: Emotion Similarity-Weighted Reward (ESWR) and Explicit Structured Reasoning (ESR). Built upon pretrained LALMs, our method employs group-relative policy optimization with emotion constraints. Comprehensive experiments demonstrate that our EMO-RL training strategies can significantly enhance the emotional reasoning capabilities of LALMs, achieving state-of-the-art performance on the MELD and IEMOCAP datasets, and cross-dataset experiments prove the strong superiority of generalization.