Ziwei Wang
Other people with similar names: Ziwei Wang
Unverified author pages with similar names: Ziwei Wang
2026
Can LLMs Hear the Dogwhistle?
Yifan Liu | Yi Lin | Xinwei Guo | Ziwei Wang | Jiaxin Zhang | Guanhua Chen | Haiyan Wu | Xiangyu Zhao | Xin Yao | Xuetao Wei
Findings of the Association for Computational Linguistics: ACL 2026
Yifan Liu | Yi Lin | Xinwei Guo | Ziwei Wang | Jiaxin Zhang | Guanhua Chen | Haiyan Wu | Xiangyu Zhao | Xin Yao | Xuetao Wei
Findings of the Association for Computational Linguistics: ACL 2026
With the widespread deployment of large language models (LLMs), existing safety benchmarks remain largely focused on explicitly harmful content, overlooking context-dependent expressions such as dogwhistles, the language that conveys harmful intent while appearing benign on the surface. To address this gap, we introduce DogBench, a comprehensive benchmark for evaluating LLM safety under dogwhistle-driven prompts. DogBench comprises 11,150 prompt instances constructed from controlled templates that embed dogwhistle terms, allowing for enabling direct comparison with explicit toxic terms under identical prompt structures. Each prompt is further annotated with pragmatic attributes, including interaction category and stance tendency. Extensive evaluations across multiple mainstream LLMs reveal a consistent pattern: dogwhistle prompts are substantially more likely to elicit harmful outputs than their explicit toxic counterparts, with an average risk increase of approximately fourfold. These findings expose a blind spot in current safety evaluation and alignment practices. Our work underscores the need to explicitly incorporate dogwhistles into future LLM safety research, with DogBench serving as a dedicated benchmark for this purpose.