Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models

Hao Yang (杨浩); Lizhen Qu; Ehsan Shareghi; Gholamreza Haffari

doi:10.18653/v1/2025.emnlp-main.510

Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models

Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari

Abstract

Large Audio Language Models (LALMs) have extended the capabilities of Large Language Models (LLMs) by enabling audio-based human interactions. However, recent research has revealed that LALMs remain vulnerable to harmful queries due to insufficient safety-alignment. Despite advances in defence measures for text and vision LLMs, effective safety-alignment strategies and audio-safety dataset specifically targeting LALMs are notably absent. Meanwhile defence measures based on Supervised Fine-tuning (SFT) struggle to address safety improvement while avoiding over-rejection issues, significantly compromising helpfulness. In this work, we propose an unsupervised safety-fine-tuning strategy as remedy that reshapes model’s representation space to enhance existing LALMs safety-alignment while balancing the risk of over-rejection. Our experiments, conducted across three generations of Qwen LALMs, demonstrate that our approach significantly improves LALMs safety under three modality input conditions (audio-text, text-only, and audio-only) while increasing over-rejection rate by only 0.88% on average. Warning: this paper contains harmful examples.

Anthology ID:: 2025.emnlp-main.510
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10078–10090
Language:
URL:: https://preview.aclanthology.org/author-page-abhinav-gupta-mila/2025.emnlp-main.510/
DOI:: 10.18653/v1/2025.emnlp-main.510
Bibkey:
Cite (ACL):: Hao Yang, Lizhen Qu, Ehsan Shareghi, and Gholamreza Haffari. 2025. Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 10078–10090, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models (Yang et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-abhinav-gupta-mila/2025.emnlp-main.510.pdf
Checklist:: 2025.emnlp-main.510.checklist.pdf

PDF Cite Search Checklist Fix data