Linda X. Zou
2025
Multilingual Large Language Models Leak Human Stereotypes across Language Boundaries
Yang Trista Cao
|
Anna Sotnikova
|
Jieyu Zhao
|
Linda X. Zou
|
Rachel Rudinger
|
Hal Daumé III
Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)
Multilingual large language models have gained prominence for their proficiency in processing and generating text across languages. Like their monolingual counterparts, multilingual models are likely to pick up on stereotypes and other social biases during training. In this paper, we study a phenomenon we term “stereotype leakage”, which refers to how training a model multilingually may lead to stereotypes expressed in one language showing up in the models’ behavior in another. We propose a measurement framework for stereotype leakage and investigate its effect in English, Russian, Chinese, and Hindi and with GPT-3.5, mT5, and mBERT. Our findings show a noticeable leakage of positive, negative, and nonpolar associations across all languages. We find that GPT-3.5 exhibits the most stereotype leakage of these models, and Hindi is the most susceptible to leakage effects.