SubmissionNumber#=%=#35
FinalPaperTitle#=%=#Beyond Words: Multilingual and Multimodal Red Teaming of MLLMs
ShortPaperTitle#=%=#
NumberOfPages#=%=#9
CopyrightSigned#=%=#Erik Derner
JobTitle#==#
Organization#==#ELLIS Alicante
Fundación de la Comunitat Valenciana Unidad ELLIS Alicante
Muelle de Poniente 5
Distrito Digital 5, Edificio A, Puerto de Alicante
03001 Alicante (Alicante)
Spain
Abstract#==#Multimodal large language models (MLLMs) are increasingly deployed in real-world applications, yet their safety remains underexplored, particularly in multilingual and visual contexts. In this work, we present a systematic red teaming framework to evaluate MLLM safeguards using adversarial prompts translated into seven languages and delivered via four input modalities: plain text, jailbreak prompt + text, text rendered as an image, and jailbreak prompt + text rendered as an image. We find that rendering prompts as images increases attack success rates and reduces refusal rates, with the effect most pronounced in lower-resource languages such as Slovenian, Czech, and Valencian. Our results suggest that vision-based multilingual attacks expose a persistent gap in current alignment strategies, highlighting the need for robust multilingual and multimodal MLLM safety evaluation and mitigation of these risks. We make our code and data available.
Author{1}{Firstname}#=%=#Erik
Author{1}{Lastname}#=%=#Derner
Author{1}{Username}#=%=#erik.derner
Author{1}{Email}#=%=#erik@ellisalicante.org
Author{1}{Affiliation}#=%=#ELLIS Alicante
Author{2}{Firstname}#=%=#Kristina
Author{2}{Lastname}#=%=#Batistič
Author{2}{Email}#=%=#kristina.batistic@gmail.com
Author{2}{Affiliation}#=%=#Independent Researcher

==========
èéáğö