Multilingual Collaborative Defense for Large Language Models

Hongliang Li, Jinan Xu, Gengping Cui, Changhao Guan, Fengran Mo, Kaiyu Huang


Abstract
The robustness and security of Large Language Models (LLMs) face increasing threats, especially in multilingual settings. A notable vulnerability is “jailbreaking” via translating harmful queries into rare or underrepresented languages, which often bypasses existing safeguards. In this work, we propose Multilingual Collaborative Defense (MCD), a novel learning method that optimizes a continuous soft safety prompt automatically to facilitate multilingual safeguarding of LLMs. MCD organically leverages collaborative signals from multiple languages by rotating each as the training “center,” allowing auxiliary languages to reinforce safety prompt learning and ensuring cross‐lingual consistency. As a result, MCD improves defense performance across all languages, reduces false refusals, and mitigates safety misalignment caused by corpus imbalance. To evaluate MCD, we construct multilingual versions of jailbreak benchmarks such as MaliciousInstruct and AdvBench, including zero-shot languages, to assess language transferability. Experiments show that MCD outperforms prior approaches in multilingual jailbreak defense while exhibiting strong cross-lingual generalization. Our code is available at https://github.com/HLiang-Lee/MCD.
Anthology ID:
2025.findings-emnlp.200
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3735–3755
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.200/
DOI:
10.18653/v1/2025.findings-emnlp.200
Bibkey:
Cite (ACL):
Hongliang Li, Jinan Xu, Gengping Cui, Changhao Guan, Fengran Mo, and Kaiyu Huang. 2025. Multilingual Collaborative Defense for Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 3735–3755, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Multilingual Collaborative Defense for Large Language Models (Li et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.200.pdf
Checklist:
 2025.findings-emnlp.200.checklist.pdf