Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations

Sheng-Lun Wei, Yu-Ling Liao, Yen-Hua Chang, Hen-Hsen Huang, Hsin-Hsi Chen


Abstract
Recent multimodal large language models (MLLMs) extend language understanding beyond text to speech, enabling unified reasoning across modalities. While biases in text-based LLMs have been widely examined, their persistence and manifestation in spoken inputs remain underexplored. This work presents the first systematic investigation of speech bias in multilingual MLLMs.We construct and release the BiasInEar Dataset, a speech-augmented benchmark based on Global MMLU Lite, spanning English, Chinese, and Korean, balanced by gender and accent, and totaling 70.8 hours (4,249 minutes) of speech with 11,200 questions. Using four complementary metrics (accuracy, entropy, APES, and Fleiss’ 𝜅), we evaluate nine representative models under linguistic language and accent, demographic gender, and structural option order perturbations. Our findings reveal that MLLMs are relatively robust to demographic factors but highly sensitive to language and option order, suggesting that speech can amplify existing structural biases. Moreover, architectural design and reasoning strategy substantially affect robustness across languages. Overall, this study establishes a unified framework for assessing fairness and robustness in speech-integrated LLMs, bridging the gap between text- and speech-based evaluation.
Anthology ID:
2026.findings-eacl.80
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1570–1589
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.80/
DOI:
Bibkey:
Cite (ACL):
Sheng-Lun Wei, Yu-Ling Liao, Yen-Hua Chang, Hen-Hsen Huang, and Hsin-Hsi Chen. 2026. Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations. In Findings of the Association for Computational Linguistics: EACL 2026, pages 1570–1589, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations (Wei et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.80.pdf
Checklist:
 2026.findings-eacl.80.checklist.pdf