Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding

Haneul Yoo, Yongjin Yang, Hwaran Lee


Abstract
As large language models (LLMs) have advanced rapidly, concerns regarding their safety have become prominent. In this paper, we discover that code-switching in red-teaming queries can effectively elicit undesirable behaviors of LLMs, which are common practices in natural language. We introduce a simple yet effective framework, CSRT, to synthesize code-switching red-teaming queries and investigate the safety and multilingual understanding of LLMs comprehensively. Through extensive experiments with ten state-of-the-art LLMs and code-switching queries combining up to 10 languages, we demonstrate that the CSRT significantly outperforms existing multilingual red-teaming techniques, achieving 46.7% more attacks than standard attacks in English and being effective in conventional safety domains. We also examine the multilingual ability of those LLMs to generate and understand code-switching texts. Additionally, we validate the extensibility of the CSRT by generating code-switching attack prompts with monolingual data. We finally conduct detailed ablation studies exploring code-switching and propound unintended correlation between resource availability of languages and safety alignment in existing multilingual LLMs.
Anthology ID:
2025.acl-long.657
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13392–13413
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.657/
DOI:
Bibkey:
Cite (ACL):
Haneul Yoo, Yongjin Yang, and Hwaran Lee. 2025. Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13392–13413, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding (Yoo et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.657.pdf