Rongchen Guo

2025

pdf bib abs
Semantic Differentiation in Speech Emotion Recognition: Insights from Descriptive and Expressive Speech Roles
Rongchen Guo | Vincent Francoeur | Isar Nejadgholi | Sylvain Gagnon | Miodrag Bolic
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)

Speech Emotion Recognition (SER) is essential for improving human-computer interaction, yet its accuracy remains constrained by the complexity of emotional nuances in speech. In this study, we distinguish between descriptive\ semantics, which represents the contextual content of speech, and expressive\ semantics, which reflects the speaker’s emotional state. After watching emotionally charged movie segments, we recorded audio clips of participants describing their experiences, along with the intended emotion tags for each clip, participants’ self-rated emotional responses, and their valence/arousal scores. Through experiments we show that descriptive semantics align with intended emotions, while expressive semantics correlate with evoked emotions. Our findings inform SER applications in human-AI interaction and pave the way for more context-aware AI systems.

2024

pdf bib abs
Adaptable Moral Stances of Large Language Models on Sexist Content: Implications for Society and Gender Discourse
Rongchen Guo | Isar Nejadgholi | Hillary Dawkins | Kathleen C. Fraser | Svetlana Kiritchenko
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

This work provides an explanatory view of how LLMs can apply moral reasoning to both criticize and defend sexist language. We assessed eight large language models, all of which demonstrated the capability to provide explanations grounded in varying moral perspectives for both critiquing and endorsing views that reflect sexist assumptions. With both human and automatic evaluation, we show that all eight models produce comprehensible and contextually relevant text, which is helpful in understanding diverse views on how sexism is perceived. Also, through analysis of moral foundations cited by LLMs in their arguments, we uncover the diverse ideological perspectives in models’ outputs, with some models aligning more with progressive or conservative views on gender roles and sexism.Based on our observations, we caution against the potential misuse of LLMs to justify sexist language. We also highlight that LLMs can serve as tools for understanding the roots of sexist beliefs and designing well-informed interventions. Given this dual capacity, it is crucial to monitor LLMs and design safety mechanisms for their use in applications that involve sensitive societal topics, such as sexism.

Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT (CITATION) with a frozen LLM, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from captions in the MusicCaps datasets, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. Our introduced dataset enables notable advancements beyond previous ones.

Co-authors

Svetlana Kiritchenko 1

Yudong Liu 1

Yinghao Ma 1

Ge Zhang 1

Venues

Fix author