RSA-Bench: Benchmarking Audio Large Models in Real-World Acoustic Scenarios

Yibo Zhang, Kaiwen Luo, Liang Lin, Shilinlu Yan, Jin Wang, Yaoqi Guo, Yitian Chen, Yalan Qin, Zhenhong Zhou, Kun Wang, Li Sun


Abstract
While Audio Large Models (ALLMs) have achieved remarkable proficiency, their robustness remains brittle in real-world deployment. Existing evaluations largely rely on synthetic Gaussian noise or simplistic single-source interference, failing to capture the intricate, multi-layered acoustic dynamics—or "Acoustic Ecology"—that characterize authentic physical environments. To bridge this ecological gap, we introduce RSA-Bench, a comprehensive robustness benchmark designed to stress-test ALLMs through high-fidelity auditory scene simulations. Unlike traditional methods, we construct evaluation samples by naturally superimposing diverse environmental soundscapes—spanning Pasture, Extreme Weather, Classroom, and Outdoors—onto clean speech signals across a spectrum of interference intensities. By evaluating models on six core tasks ranging from fundamental perception to complex reasoning, our study unveils three macro-level insights: (I) The Perception-Cognition Gap: Models maintain relative resilience in low-level recognition but suffer a functional collapse in high-order reasoning tasks under stress; (II) Scenario Sensitivity: "Vocal-like" interference (e.g., children playing) proves significantly more destructive than mechanical noise, challenging the model’s auditory attention mechanisms; and (III) The Denoising Paradox: Standard speech enhancement often exacerbates performance degradation, as ALLMs prove highly sensitive to the semantic distortions introduced by denoising artifacts.
Anthology ID:
2026.findings-acl.913
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
18350–18374
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.913/
DOI:
Bibkey:
Cite (ACL):
Yibo Zhang, Kaiwen Luo, Liang Lin, Shilinlu Yan, Jin Wang, Yaoqi Guo, Yitian Chen, Yalan Qin, Zhenhong Zhou, Kun Wang, and Li Sun. 2026. RSA-Bench: Benchmarking Audio Large Models in Real-World Acoustic Scenarios. In Findings of the Association for Computational Linguistics: ACL 2026, pages 18350–18374, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
RSA-Bench: Benchmarking Audio Large Models in Real-World Acoustic Scenarios (Zhang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.913.pdf
Checklist:
 2026.findings-acl.913.checklist.pdf