Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment

Hanlin Wu, Xufeng Duan, Zhenguang Cai


Abstract
Voice-based AI development faces unique challenges in processing both linguistic and paralinguistic information. This study compares how large audio-language models (LALMs) and humans integrate speaker characteristics during speech comprehension, asking whether LALMs process speaker-contextualized language in ways that parallel human cognitive mechanisms. We compared two LALMs’ (Qwen2-Audio and Ultravox 0.5) processing patterns with human EEG responses. Using surprisal and entropy metrics from the models, we analyzed their sensitivity to speaker-content incongruency across social stereotype violations (e.g., a man claiming to regularly get manicures) and biological knowledge violations (e.g., a man claiming to be pregnant). Results revealed that Qwen2-Audio exhibited increased surprisal for speaker-incongruent content and its surprisal values significantly predicted human N400 responses, while Ultravox 0.5 showed limited sensitivity to speaker characteristics. Importantly, neither model replicated the human-like processing distinction between social violations (eliciting N400 effects) and biological violations (eliciting P600 effects). These findings reveal both the potential and limitations of current LALMs in processing speaker-contextualized language, and suggest differences in social-linguistic processing mechanisms between humans and LALMs.
Anthology ID:
2025.cmcl-1.18
Volume:
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Month:
May
Year:
2025
Address:
Albuquerque, New Mexico, USA
Editors:
Tatsuki Kuribayashi, Giulia Rambelli, Ece Takmaz, Philipp Wicke, Jixing Li, Byung-Doh Oh
Venues:
CMCL | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
135–143
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.cmcl-1.18/
DOI:
Bibkey:
Cite (ACL):
Hanlin Wu, Xufeng Duan, and Zhenguang Cai. 2025. Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 135–143, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment (Wu et al., CMCL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.cmcl-1.18.pdf