HateImgPrompts: Mitigating Generation of Images Spreading Hate Speech

Vineet Kumar Khullar, Venkatesh Velugubantla, Bhanu Prakash Reddy Rella, Mohan Krishna Mannava, Msvpj Sathvik


Abstract
The emergence of artificial intelligence has proven beneficial to numerous organizations, particularly in its various applications for social welfare. One notable application lies in AI-driven image generation tools. These tools produce images based on provided prompts. While this technology holds potential for constructive use, it also carries the risk of being exploited for malicious purposes, such as propagating hate. To address this we propose a novel dataset “HateImgPrompts”. We have benchmarked the dataset with the latest models including GPT-3.5, LLAMA 2, etc. The dataset consists of 9467 prompts and the accuracy of the classifier after finetuning of the dataset is around 81%.
Anthology ID:
2025.nlp4dh-1.53
Volume:
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Month:
May
Year:
2025
Address:
Albuquerque, USA
Editors:
Mika Hämäläinen, Emily Öhman, Yuri Bizzoni, So Miyagawa, Khalid Alnajjar
Venues:
NLP4DH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
647–652
Language:
URL:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.nlp4dh-1.53/
DOI:
Bibkey:
Cite (ACL):
Vineet Kumar Khullar, Venkatesh Velugubantla, Bhanu Prakash Reddy Rella, Mohan Krishna Mannava, and Msvpj Sathvik. 2025. HateImgPrompts: Mitigating Generation of Images Spreading Hate Speech. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, pages 647–652, Albuquerque, USA. Association for Computational Linguistics.
Cite (Informal):
HateImgPrompts: Mitigating Generation of Images Spreading Hate Speech (Khullar et al., NLP4DH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.nlp4dh-1.53.pdf