Ruolei Zhang
2026
Cross-Lingual Bias in Large Language Models: A Comparative Analysis of English and Swahili
Ruolei Zhang | Teddy Njuguna | Yue Feng
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
Ruolei Zhang | Teddy Njuguna | Yue Feng
Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM 2026)
Large language models are increasingly deployed in multilingual contexts, yet safety alignment and bias evaluation remain overwhelmingly English-centric. We investigate whether social biases generalise across languages by submitting 4,900 symmetric English–Swahili prompt pairs to GPT-5.2 and Gemini 2.5 Flash across nine demographic bias axes, yielding 19,600 completions evaluated for stereotype prevalence, sentiment, refusal behaviour, and cross-lingual semantic similarity. Our findings show that bias transforms rather than transfers: stereotype rates shifted by up to 12 percentage points on specific axes, Gemini’s neutral-sentiment rate doubled in Swahili, and GPT-5.2 refused 169 prompts in English and zero in Swahili, indicating safety mechanisms functionally anchored to English-language tokens. Over 55% of prompt pairs produced semantically dissimilar completions across both models. These reinforce the idea that English-only bias audits do not produce adequate coverage for multilingual deployment.