Prachi Goyal
2026
Mind the Gap: Multilingual Divide in LLM Bias Detection and Reasoning
Medha Hira | Prachi Goyal | Raj Maheshwari | Arnav Goel
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Medha Hira | Prachi Goyal | Raj Maheshwari | Arnav Goel
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Large Language Models (LLMs) are increasingly deployed in multilingual settings, yet most bias evaluation remains English-centric and overlooks how bias manifests within reasoning. We present a systematic study of social bias in both predictions and chain-of-thought reasoning across English, Dutch, Spanish, and Turkish using the MBBQ benchmark. We evaluate instruction-tuned, CoT-prompted, and reasoning-native models under supervised fine-tuning and preference optimization, using accuracy, F1, bias metrics, and a novel reasoning-level language drift measure. We find that (1) bias varies substantially across languages, with consistent degradation in non-English settings, (2) reasoning traces often introduce additional stereotype-driven signals beyond final outputs, and (3) English-trained debiasing methods fail to generalize reliably, with preference optimization introducing cross-lingual trade-offs. We further show that performance gains in multilingual settings are frequently driven by implicit reliance on English-centric reasoning, revealed through increased language drift. Together, our results demonstrate that multilingual fairness cannot be inferred from English performance and requires reasoning-aware, language-specific evaluation and alignment.