Protecting Bystander Privacy via Selective Hearing in Audio LLMs

Xiao Zhan; Guangzhi Sun; Jose Such; Phil Woodland

Protecting Bystander Privacy via Selective Hearing in Audio LLMs

Xiao Zhan, Guangzhi Sun, Jose Such, Phil Woodland

Abstract

Audio Large language models (LLMs) are increasingly deployed in the real world, where they inevitably capture speech from unintended nearby bystanders, raising privacy risks that existing benchmarks and defences did not consider. We introduce SH-Bench, the first benchmark designed to evaluate selective hearing: a model’s ability to attend to an intended main speaker while refusing to process or reveal information about incidental bystander speech. SH-Bench contains 3,968 multi-speaker audio mixtures, including both real-world and synthetic scenarios, paired with 77k multiple-choice questions that probe models under general and selective operating modes. In addition, we propose Selective Efficacy (SE), a novel metric capturing both multi-speaker comprehension and bystander-privacy protection. Our evaluation of state-of-the-art open-source and proprietary LLMs reveals substantial bystander privacy leakage, with strong audio understanding failing to translate into selective protection of bystander privacy. To mitigate this gap, we also present Bystander Privacy Fine-Tuning (BPFT), a novel training pipeline that teaches models to refuse bystander-related queries without degrading main-speaker comprehension. We show that BPFT yields substantial gains, achieving an absolute 47% higher bystander accuracy under selective mode and an absolute 16% higher SE compared to Gemini 2.5 Pro, which is the best audio LLM without BPFT. Together, SH-Bench and BPFT provide the first systematic framework for measuring and improving bystander privacy in audio LLMs.

Anthology ID:: 2026.acl-long.693
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15180–15192
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.693/
DOI:
Bibkey:
Cite (ACL):: Xiao Zhan, Guangzhi Sun, Jose Such, and Phil Woodland. 2026. Protecting Bystander Privacy via Selective Hearing in Audio LLMs. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15180–15192, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Protecting Bystander Privacy via Selective Hearing in Audio LLMs (Zhan et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.693.pdf
Checklist:: 2026.acl-long.693.checklist.pdf

PDF Cite Search Checklist Fix data