Sheep’s Skin, Wolf’s Deeds: Are LLMs Ready for Metaphorical Implicit Hate Speech?
Jingjie Zeng, Liang Yang, Zekun Wang, Yuanyuan Sun, Hongfei Lin
Abstract
Implicit hate speech has become a significant challenge for online platforms, as it often avoids detection by large language models (LLMs) due to its indirectly expressed hateful intent. This study identifies the limitations of LLMs in detecting implicit hate speech, particularly when disguised as seemingly harmless expressions in a rhetorical device. To address this challenge, we employ a Jailbreaking strategy and Energy-based Constrained Decoding techniques, and design a small model for measuring the energy of metaphorical rhetoric. This approach can lead to LLMs generating metaphorical implicit hate speech. Our research reveals that advanced LLMs, like GPT-4o, frequently misinterpret metaphorical implicit hate speech, and fail to prevent its propagation effectively. Even specialized models, like ShieldGemma and LlamaGuard, demonstrate inadequacies in blocking such content, often misclassifying it as harmless speech. This work points out the vulnerability of current LLMs to implicit hate speech, and emphasizes the improvements to address hate speech threats better.- Anthology ID:
- 2025.acl-long.814
- Volume:
- Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 16657–16677
- Language:
- URL:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.814/
- DOI:
- Cite (ACL):
- Jingjie Zeng, Liang Yang, Zekun Wang, Yuanyuan Sun, and Hongfei Lin. 2025. Sheep’s Skin, Wolf’s Deeds: Are LLMs Ready for Metaphorical Implicit Hate Speech?. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16657–16677, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Sheep’s Skin, Wolf’s Deeds: Are LLMs Ready for Metaphorical Implicit Hate Speech? (Zeng et al., ACL 2025)
- PDF:
- https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.814.pdf