Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models

Xiutian Zhao, Björn Schuller, Berrak Sisman


Abstract
Emotion is a central dimension of spoken communication, yet, we still lack a mechanistic account of how modern large audio-language models (LALMs) encode it internally. We present the first neuron-level interpretability study of emotion-sensitive neurons (ESNs) in LALMs and provide causal evidence supporting the existence of such units in Qwen2.5-Omni, Kimi-Audio, and Audio Flamingo 3. Across these three widely used open-source models, we compare frequency-, entropy-, mean-deviation-, and contrast-based neuron selectors on multiple emotion recognition benchmarks. Using inference-time interventions, we reveal a consistent emotion-specific signature: deactivating neurons selected for a given emotion disproportionately degrades recognition of that emotion while largely preserving other classes, whereas targeted steering amplifies these units to bias predictions toward the target emotion. These effects arise with modest amounts of identification data and scale systematically with intervention strength. We further observe that ESNs exhibit non-uniform layer-wise clustering with partial cross-dataset transfer. Taken together, our results offer a causal, neuron-level account of emotion decisions in LALMs and highlight targeted neuron interventions as an actionable handle for controllable affective behaviors.
Anthology ID:
2026.acl-long.687
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15056–15071
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.687/
DOI:
Bibkey:
Cite (ACL):
Xiutian Zhao, Björn Schuller, and Berrak Sisman. 2026. Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15056–15071, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Discovering and Causally Validating Emotion-Sensitive Neurons in Large Audio-Language Models (Zhao et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.687.pdf
Checklist:
 2026.acl-long.687.checklist.pdf