Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding

Haneul Yoo; Yongjin Yang; Hwaran Lee

Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding

Abstract

As large language models (LLMs) have advanced rapidly, concerns regarding their safety have become prominent. In this paper, we discover that code-switching in red-teaming queries can effectively elicit undesirable behaviors of LLMs, which are common practices in natural language. We introduce a simple yet effective framework, CSRT, to synthesize code-switching red-teaming queries and investigate the safety and multilingual understanding of LLMs comprehensively. Through extensive experiments with ten state-of-the-art LLMs and code-switching queries combining up to 10 languages, we demonstrate that the CSRT significantly outperforms existing multilingual red-teaming techniques, achieving 46.7% more attacks than standard attacks in English and being effective in conventional safety domains. We also examine the multilingual ability of those LLMs to generate and understand code-switching texts. Additionally, we validate the extensibility of the CSRT by generating code-switching attack prompts with monolingual data. We finally conduct detailed ablation studies exploring code-switching and propound unintended correlation between resource availability of languages and safety alignment in existing multilingual LLMs.

Anthology ID:: 2025.acl-long.657
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13392–13413
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.657/
DOI:
Bibkey:
Cite (ACL):: Haneul Yoo, Yongjin Yang, and Hwaran Lee. 2025. Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13392–13413, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding (Yoo et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.657.pdf

PDF Cite Search Fix data