Mātṛkā: Multilingual Jailbreak Evaluation of Open-Source Large Language Models

Murali Emani, Kashyap Manjusha R


Abstract
Artificial Intelligence (AI) and Large Language Models (LLMs) are increasingly integrated into high-stakes applications, yet their susceptibility to adversarial prompts poses significant security risks. In this work, we introduce Mātṛkā, a framework for systematically evaluating jailbreak vulnerabilities in open-source multilingual LLMs. Using the open-source dataset across nine sensitive categories, we constructed adversarial prompt sets that combine translation, mixed-language encoding, homoglyph signatures, numeric enforcement, and structural variations. Experiments were conducted on state-of-the-art open-source models from Llama, Qwen, GPT-OSS, Mistral, and Gemma families. Our findings highlight transferability of jailbreaks across multiple languages with varying success rates depending on attack design. We provide empirical insights, a novel taxonomy of multilingual jailbreak strategies, and recommendations for enhancing robustness in safety-critical environments.
Anthology ID:
2025.bhasha-1.10
Volume:
Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025)
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Arnab Bhattacharya, Pawan Goyal, Saptarshi Ghosh, Kripabandhu Ghosh
Venues:
BHASHA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
117–121
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.bhasha-1.10/
DOI:
Bibkey:
Cite (ACL):
Murali Emani and Kashyap Manjusha R. 2025. Mātṛkā: Multilingual Jailbreak Evaluation of Open-Source Large Language Models. In Proceedings of the 1st Workshop on Benchmarks, Harmonization, Annotation, and Standardization for Human-Centric AI in Indian Languages (BHASHA 2025), pages 117–121, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Mātṛkā: Multilingual Jailbreak Evaluation of Open-Source Large Language Models (Emani & R, BHASHA 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.bhasha-1.10.pdf