ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content

Bhavik Chandna; Mariam Aboujenane; Usman Naseem

doi:10.18653/v1/2025.findings-emnlp.1176

ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content

Bhavik Chandna, Mariam Aboujenane, Usman Naseem

Abstract

Large Multimodal Models (LMMs) are increasingly vulnerable to AI-generated extremist content, including photorealistic images and text, which can be used to bypass safety mechanisms and generate harmful outputs. However, existing datasets for evaluating LMM robustness offer limited exploration of extremist content, often lacking AI-generated images, diverse image generation models, and comprehensive coverage of historical events, which hinders a complete assessment of model vulnerabilities. To fill this gap, we introduce ExtremeAIGC, a benchmark dataset and evaluation framework designed to assess LMM vulnerabilities against such content. ExtremeAIGC simulates real-world events and malicious use cases by curating diverse text and image based examples crafted using state-of-the-art image generation techniques. Our study reveals alarming weaknesses in LMMs, demonstrating that even cutting-edge safety measures fail to prevent the generation of extremist material. We systematically quantify the success rates of various attack strategies, exposing critical gaps in current defenses and emphasizing the need for more robust mitigation strategies. The code and data can be found at https://github.com/TheProParadox/ExtremeAIGC.

Anthology ID:: 2025.findings-emnlp.1176
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21565–21579
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1176/
DOI:: 10.18653/v1/2025.findings-emnlp.1176
Bibkey:
Cite (ACL):: Bhavik Chandna, Mariam Aboujenane, and Usman Naseem. 2025. ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 21565–21579, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: ExtremeAIGC: Benchmarking LMM Vulnerability to AI-Generated Extremist Content (Chandna et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.1176.pdf
Checklist:: 2025.findings-emnlp.1176.checklist.pdf

PDF Cite Search Checklist Fix data