Stereotype Bias in a Bilingual Setting: A Culturally Grounded Evaluation in Kazakhstan

Nurkhan Laiyk; Daniil Orel; Ayana Mussabayeva; Maiya Goloburda; Kamila Kuishibekova; Liya Goloburda; Diana Turmakhan; Preslav Nakov; Yuxia Wang; Fajri Koto

Stereotype Bias in a Bilingual Setting: A Culturally Grounded Evaluation in Kazakhstan

Nurkhan Laiyk, Daniil Orel, Ayana Mussabayeva, Maiya Goloburda, Kamila Kuishibekova, Liya Goloburda, Diana Turmakhan, Preslav Nakov, Yuxia Wang, Fajri Koto

Abstract

Stereotype bias in language models has been widely examined in English, but remains largely understudied in bilingual contexts where multiple linguistic and cultural systems interact. This gap is especially important in regions where language use reflects complex historical and sociopolitical influences. In this work, we focus on Kazakhstan, a bilingual society where Kazakh, a low-resource Turkic language, and Russian, a high-resource Slavic language, are both actively used and frequently code-mixed in everyday communication. We introduce Aqbileq, a high-quality, human-verified dataset consisting of 5,634 stereotype-bearing statements in Kazakh, Russian, and code-mixed forms, covering six culturally salient domains. We evaluate both multilingual and Kazakh-specific language models using perplexity-based scoring and pretraining simulations, and find that stereotype bias is most pronounced in code-mixed inputs. Our results highlight the limitations of existing evaluation frameworks and emphasize the need for culturally grounded, linguistically inclusive benchmarks to better assess and mitigate bias in language models.

Anthology ID:: 2026.acl-long.598
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 13114–13131
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.598/
DOI:
Bibkey:
Cite (ACL):: Nurkhan Laiyk, Daniil Orel, Ayana Mussabayeva, Maiya Goloburda, Kamila Kuishibekova, Liya Goloburda, Diana Turmakhan, Preslav Nakov, Yuxia Wang, and Fajri Koto. 2026. Stereotype Bias in a Bilingual Setting: A Culturally Grounded Evaluation in Kazakhstan. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13114–13131, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Stereotype Bias in a Bilingual Setting: A Culturally Grounded Evaluation in Kazakhstan (Laiyk et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.598.pdf
Checklist:: 2026.acl-long.598.checklist.pdf

PDF Cite Search Checklist Fix data