SubmissionNumber#=%=#35 FinalPaperTitle#=%=#Beyond Words: Multilingual and Multimodal Red Teaming of MLLMs ShortPaperTitle#=%=# NumberOfPages#=%=#9 CopyrightSigned#=%=#Erik Derner JobTitle#==# Organization#==#ELLIS Alicante Fundación de la Comunitat Valenciana Unidad ELLIS Alicante Muelle de Poniente 5 Distrito Digital 5, Edificio A, Puerto de Alicante 03001 Alicante (Alicante) Spain Abstract#==#Multimodal large language models (MLLMs) are increasingly deployed in real-world applications, yet their safety remains underexplored, particularly in multilingual and visual contexts. In this work, we present a systematic red teaming framework to evaluate MLLM safeguards using adversarial prompts translated into seven languages and delivered via four input modalities: plain text, jailbreak prompt + text, text rendered as an image, and jailbreak prompt + text rendered as an image. We find that rendering prompts as images increases attack success rates and reduces refusal rates, with the effect most pronounced in lower-resource languages such as Slovenian, Czech, and Valencian. Our results suggest that vision-based multilingual attacks expose a persistent gap in current alignment strategies, highlighting the need for robust multilingual and multimodal MLLM safety evaluation and mitigation of these risks. We make our code and data available. Author{1}{Firstname}#=%=#Erik Author{1}{Lastname}#=%=#Derner Author{1}{Username}#=%=#erik.derner Author{1}{Email}#=%=#erik@ellisalicante.org Author{1}{Affiliation}#=%=#ELLIS Alicante Author{2}{Firstname}#=%=#Kristina Author{2}{Lastname}#=%=#Batistič Author{2}{Email}#=%=#kristina.batistic@gmail.com Author{2}{Affiliation}#=%=#Independent Researcher ========== èéáğö