GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models

Xiangdong Hu, Yangyang Jiang, Qin Hu, Xiaojun Jia


Abstract
Multimodal Large Language Models (MLLMs) have become widely deployed, yet their safety alignment remains fragile under adversarial inputs. Previous work has shown that increasing inference steps can disrupt safety mechanisms and lead MLLMs to generate attacker-desired harmful content. However, most existing attacks focus on increasing the complexity of the modified visual task itself and do not explicitly leverage the model’s own reasoning incentives. This leads to them underperforming on reasoning models (Models with Chain-of-Thoughts) compared to non-reasoning ones (Models without Chain-of-Thoughts). If a model can think like a human, can we influence its cognitive-stage decisions so that it proactively completes a jailbreak? To validate this idea, we propose GAMBIT (Gamified Adversarial Multimodal Breakout via Instructional Traps), a novel multimodal jailbreak framework that decomposes and reassembles harmful visual semantics, then constructs a gamified scene that drives the model to explore, reconstruct intent, and answer as part of winning the game. The resulting structured reasoning chain increases task complexity in both vision and text, positioning the model as a participant whose goal pursuit reduces safety attention and induces it to answer the reconstructed malicious query. Extensive experiments on popular reasoning and non-reasoning MLLMs demonstrate that GAMBIT achieves high Attack Success Rates (ASR), reaching 92.13% on Gemini 2.5 Flash, 91.20% on QvQ-MAX, and 85.87% on GPT-4o, significantly outperforming baselines. Warning: This paper contains unsafe and offensive examples.
Anthology ID:
2026.acl-long.367
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8112–8129
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.367/
DOI:
Bibkey:
Cite (ACL):
Xiangdong Hu, Yangyang Jiang, Qin Hu, and Xiaojun Jia. 2026. GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8112–8129, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
GAMBIT: A Gamified Jailbreak Framework for Multimodal Large Language Models (Hu et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.367.pdf
Checklist:
 2026.acl-long.367.checklist.pdf