Can LLMs Hear the Dogwhistle?

Yifan Liu, Yi Lin, Xinwei Guo, Ziwei Wang, Jiaxin Zhang, Guanhua Chen, Haiyan Wu, Xiangyu Zhao, Xin Yao, Xuetao Wei


Abstract
With the widespread deployment of large language models (LLMs), existing safety benchmarks remain largely focused on explicitly harmful content, overlooking context-dependent expressions such as dogwhistles, the language that conveys harmful intent while appearing benign on the surface. To address this gap, we introduce DogBench, a comprehensive benchmark for evaluating LLM safety under dogwhistle-driven prompts. DogBench comprises 11,150 prompt instances constructed from controlled templates that embed dogwhistle terms, allowing for enabling direct comparison with explicit toxic terms under identical prompt structures. Each prompt is further annotated with pragmatic attributes, including interaction category and stance tendency. Extensive evaluations across multiple mainstream LLMs reveal a consistent pattern: dogwhistle prompts are substantially more likely to elicit harmful outputs than their explicit toxic counterparts, with an average risk increase of approximately fourfold. These findings expose a blind spot in current safety evaluation and alignment practices. Our work underscores the need to explicitly incorporate dogwhistles into future LLM safety research, with DogBench serving as a dedicated benchmark for this purpose.
Anthology ID:
2026.findings-acl.161
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3256–3273
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.161/
DOI:
Bibkey:
Cite (ACL):
Yifan Liu, Yi Lin, Xinwei Guo, Ziwei Wang, Jiaxin Zhang, Guanhua Chen, Haiyan Wu, Xiangyu Zhao, Xin Yao, and Xuetao Wei. 2026. Can LLMs Hear the Dogwhistle?. In Findings of the Association for Computational Linguistics: ACL 2026, pages 3256–3273, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Can LLMs Hear the Dogwhistle? (Liu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.161.pdf
Checklist:
 2026.findings-acl.161.checklist.pdf