Beyond Words: Multilingual and Multimodal Red Teaming of MLLMs

Erik Derner, Kristina Batistič


Abstract
Multimodal large language models (MLLMs) are increasingly deployed in real-world applications, yet their safety remains underexplored, particularly in multilingual and visual contexts. In this work, we present a systematic red teaming framework to evaluate MLLM safeguards using adversarial prompts translated into seven languages and delivered via four input modalities: plain text, jailbreak prompt + text, text rendered as an image, and jailbreak prompt + text rendered as an image. We find that rendering prompts as images increases attack success rates and reduces refusal rates, with the effect most pronounced in lower-resource languages such as Slovenian, Czech, and Valencian. Our results suggest that vision-based multilingual attacks expose a persistent gap in current alignment strategies, highlighting the need for robust multilingual and multimodal MLLM safety evaluation and mitigation of these risks. We make our code and data available.
Anthology ID:
2025.llmsec-1.15
Volume:
Proceedings of the The First Workshop on LLM Security (LLMSEC)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editor:
Jekaterina Novikova
Venues:
LLMSEC | WS
SIG:
SIGSEC
Publisher:
Association for Computational Linguistics
Note:
Pages:
198–206
Language:
URL:
https://preview.aclanthology.org/corrections-2025-08/2025.llmsec-1.15/
DOI:
Bibkey:
Cite (ACL):
Erik Derner and Kristina Batistič. 2025. Beyond Words: Multilingual and Multimodal Red Teaming of MLLMs. In Proceedings of the The First Workshop on LLM Security (LLMSEC), pages 198–206, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Beyond Words: Multilingual and Multimodal Red Teaming of MLLMs (Derner & Batistič, LLMSEC 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2025-08/2025.llmsec-1.15.pdf
Supplementarymaterial:
 2025.llmsec-1.15.SupplementaryMaterial.zip
Supplementarymaterial:
 2025.llmsec-1.15.SupplementaryMaterial.txt