Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment

Hanlin Wu; Xufeng Duan; Zhenguang Cai

Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment

Abstract

Voice-based AI development faces unique challenges in processing both linguistic and paralinguistic information. This study compares how large audio-language models (LALMs) and humans integrate speaker characteristics during speech comprehension, asking whether LALMs process speaker-contextualized language in ways that parallel human cognitive mechanisms. We compared two LALMs’ (Qwen2-Audio and Ultravox 0.5) processing patterns with human EEG responses. Using surprisal and entropy metrics from the models, we analyzed their sensitivity to speaker-content incongruency across social stereotype violations (e.g., a man claiming to regularly get manicures) and biological knowledge violations (e.g., a man claiming to be pregnant). Results revealed that Qwen2-Audio exhibited increased surprisal for speaker-incongruent content and its surprisal values significantly predicted human N400 responses, while Ultravox 0.5 showed limited sensitivity to speaker characteristics. Importantly, neither model replicated the human-like processing distinction between social violations (eliciting N400 effects) and biological violations (eliciting P600 effects). These findings reveal both the potential and limitations of current LALMs in processing speaker-contextualized language, and suggest differences in social-linguistic processing mechanisms between humans and LALMs.

Anthology ID:: 2025.cmcl-1.18
Volume:: Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics
Month:: May
Year:: 2025
Address:: Albuquerque, New Mexico, USA
Editors:: Tatsuki Kuribayashi, Giulia Rambelli, Ece Takmaz, Philipp Wicke, Jixing Li, Byung-Doh Oh
Venues:: CMCL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 135–143
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.cmcl-1.18/
DOI:
Bibkey:
Cite (ACL):: Hanlin Wu, Xufeng Duan, and Zhenguang Cai. 2025. Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 135–143, Albuquerque, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):: Distinct social-linguistic processing between humans and large audio-language models: Evidence from model-brain alignment (Wu et al., CMCL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.cmcl-1.18.pdf

PDF Cite Search Fix data