BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla

Mahammed Kamruzzaman, Abdullah Al Monsur, Shrabon Kumar Das, Enamul Hassan, Gene Louis Kim


Abstract
This study presents ***BanStereoSet***, a dataset designed to evaluate stereotypical social biases in multilingual LLMs for the Bangla language. In an effort to extend the focus of bias research beyond English-centric datasets, we have localized the content from the StereoSet, IndiBias, and kamruzzaman-etal’s datasets, producing a resource tailored to capture biases prevalent within the Bangla-speaking community. Our BanStereoSet dataset consists of 1,194 sentences spanning 9 categories of bias: race, profession, gender, ageism, beauty, beauty in profession, region, caste, and religion. This dataset not only serves as a crucial tool for measuring bias in multilingual LLMs but also facilitates the exploration of stereotypical bias across different social categories, potentially guiding the development of more equitable language technologies in *Bangladeshi* contexts. Our analysis of several language models using this dataset indicates significant biases, reinforcing the necessity for culturally and linguistically adapted datasets to develop more equitable language technologies.
Anthology ID:
2025.findings-acl.179
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3450–3460
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.179/
DOI:
Bibkey:
Cite (ACL):
Mahammed Kamruzzaman, Abdullah Al Monsur, Shrabon Kumar Das, Enamul Hassan, and Gene Louis Kim. 2025. BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla. In Findings of the Association for Computational Linguistics: ACL 2025, pages 3450–3460, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla (Kamruzzaman et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.179.pdf