Can LLMs Hear the Dogwhistle?
Yifan Liu, Yi Lin, Xinwei Guo, Ziwei Wang, Jiaxin Zhang, Guanhua Chen, Haiyan Wu, Xiangyu Zhao, Xin Yao, Xuetao Wei
Abstract
With the widespread deployment of large language models (LLMs), existing safety benchmarks remain largely focused on explicitly harmful content, overlooking context-dependent expressions such as dogwhistles, the language that conveys harmful intent while appearing benign on the surface. To address this gap, we introduce DogBench, a comprehensive benchmark for evaluating LLM safety under dogwhistle-driven prompts. DogBench comprises 11,150 prompt instances constructed from controlled templates that embed dogwhistle terms, allowing for enabling direct comparison with explicit toxic terms under identical prompt structures. Each prompt is further annotated with pragmatic attributes, including interaction category and stance tendency. Extensive evaluations across multiple mainstream LLMs reveal a consistent pattern: dogwhistle prompts are substantially more likely to elicit harmful outputs than their explicit toxic counterparts, with an average risk increase of approximately fourfold. These findings expose a blind spot in current safety evaluation and alignment practices. Our work underscores the need to explicitly incorporate dogwhistles into future LLM safety research, with DogBench serving as a dedicated benchmark for this purpose.- Anthology ID:
- 2026.findings-acl.161
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3256–3273
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.161/
- DOI:
- Cite (ACL):
- Yifan Liu, Yi Lin, Xinwei Guo, Ziwei Wang, Jiaxin Zhang, Guanhua Chen, Haiyan Wu, Xiangyu Zhao, Xin Yao, and Xuetao Wei. 2026. Can LLMs Hear the Dogwhistle?. In Findings of the Association for Computational Linguistics: ACL 2026, pages 3256–3273, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Can LLMs Hear the Dogwhistle? (Liu et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.161.pdf