BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla

Mahammed Kamruzzaman; Abdullah Al Monsur; Shrabon Kumar Das; Enamul Hassan; Gene Louis Kim

BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla

Mahammed Kamruzzaman, Abdullah Al Monsur, Shrabon Kumar Das, Enamul Hassan, Gene Louis Kim

Abstract

This study presents ***BanStereoSet***, a dataset designed to evaluate stereotypical social biases in multilingual LLMs for the Bangla language. In an effort to extend the focus of bias research beyond English-centric datasets, we have localized the content from the StereoSet, IndiBias, and kamruzzaman-etal’s datasets, producing a resource tailored to capture biases prevalent within the Bangla-speaking community. Our BanStereoSet dataset consists of 1,194 sentences spanning 9 categories of bias: race, profession, gender, ageism, beauty, beauty in profession, region, caste, and religion. This dataset not only serves as a crucial tool for measuring bias in multilingual LLMs but also facilitates the exploration of stereotypical bias across different social categories, potentially guiding the development of more equitable language technologies in *Bangladeshi* contexts. Our analysis of several language models using this dataset indicates significant biases, reinforcing the necessity for culturally and linguistically adapted datasets to develop more equitable language technologies.

Anthology ID:: 2025.findings-acl.179
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3450–3460
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.179/
DOI:
Bibkey:
Cite (ACL):: Mahammed Kamruzzaman, Abdullah Al Monsur, Shrabon Kumar Das, Enamul Hassan, and Gene Louis Kim. 2025. BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla. In Findings of the Association for Computational Linguistics: ACL 2025, pages 3450–3460, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla (Kamruzzaman et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.179.pdf

PDF Cite Search Fix data