Jihwan Seol


2025

pdf bib
VoiceBBQ: Investigating Effect of Content and Acoustics in Social Bias of Spoken Language Model
Junhyuk Choi | Ro-hoon Oh | Jihwan Seol | Bugeun Kim
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

We introduce VoiceBBQ, a spoken extension of the BBQ (Bias Benchmark for Question answering) - a dataset that measures social bias by presenting ambiguous or disambiguated contexts followed by questions that may elicit stereotypical responses. Due to the nature of speech modality, social bias in Spoken Language Models (SLMs) can emerge from two distinct sources: 1) content aspect and 2) acoustic aspect. The dataset converts every BBQ context into controlled voice conditions, enabling per-axis accuracy, bias, and consistency scores that remain comparable to the original text benchmark. Using VoiceBBQ, we evaluate two SLMs—LLaMA-Omni and Qwen2-Audio—and observe architectural contrasts: LLaMA-Omni retains strong acoustic sensitivity, amplifying gender and accent bias, whereas Qwen2-Audio substantially dampens these cues while preserving content fidelity. VoiceBBQ thus provides a compact, drop-in testbed for jointly diagnosing content and acoustic bias across spoken language models.