SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models

Xingjian Diao; Chunhui Zhang; Keyi Kong; Weiyi Wu; Chiyu Ma; Zhongyu Ouyang; Peijun Qing; Soroush Vosoughi; Jiang Gui

SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models

Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui

Abstract

While large language models have demonstrated impressive reasoning abilities, their extension to the audio modality, particularly within large audio-language models (LALMs), remains underexplored. Addressing this gap requires a systematic approach that involves a capable base model, high-quality reasoning-oriented audio data, and effective training algorithms. In this work, we present a comprehensive solution for audio logical reasoning (ALR) tasks: we introduce SoundMind, a dataset of 6,446 audio–text annotated samples specifically curated to support complex reasoning. Building on this resource, we propose SoundMind-RL, a rule-based reinforcement learning (RL) algorithm designed to equip audio-language models with robust audio–text reasoning capabilities. By fine-tuning Qwen2.5-Omni-7B on the proposed SoundMind dataset using SoundMind-RL, we achieve strong and consistent improvements over state-of-the-art baselines on the SoundMind benchmark. This work highlights the benefit of combining high-quality, reasoning-focused datasets with specialized RL techniques, and contributes to advancing auditory intelligence in language models. The code and dataset are publicly available at https://github.com/xid32/SoundMind.

Anthology ID:: 2025.emnlp-main.27
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 528–540
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.27/
DOI:
Bibkey:
Cite (ACL):: Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, and Jiang Gui. 2025. SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 528–540, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models (Diao et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.27.pdf
Checklist:: 2025.emnlp-main.27.checklist.pdf

PDF Cite Search Checklist Fix data