Assessing Gender Bias of Pretrained Bangla Language Models in STEM and SHAPE Fields

Noor Mairukh Khan Arnob; Saiyara Mahmud; Azmine Toushik Wasi

Assessing Gender Bias of Pretrained Bangla Language Models in STEM and SHAPE Fields

Noor Mairukh Khan Arnob, Saiyara Mahmud, Azmine Toushik Wasi

Abstract

Gender bias continues to shape societal perceptions across both STEM (Science, Technology, Engineering, and Mathematics) and SHAPE (Social Sciences, Humanities, and the Arts for People and the Economy) domains. While existing studies have explored such biases in English language models, similar analyses in Bangla—spoken by over 240 million people—remain scarce. In this work, we investigate gender-profession associations in Bangla language models. We introduce Pokkhopat, a curated dataset of gendered terms and profession-related words across STEM and SHAPE disciplines. Using a suite of embedding-based bias detection methods—including WEAT, ECT, RND, RIPA, and cosine similarity visualizations—we evaluate 11 Bangla language models. Our findings show that several widely-used open-source Bangla NLP models (e.g., sagorsarker/bangla-bert-base) exhibit significant gender bias, underscoring the need for more inclusive and bias-aware development in low-resource languages like Bangla. We also find that many STEM and SHAPE-related words are absent from these models’ vocabularies, complicating bias detection and possibly amplifying existing biases. This emphasizes the importance of incorporating more diverse and comprehensive training data to mitigate such biases moving forward. Code available at https://github.com/HerWILL-Inc/ACL-2025/.

Anthology ID:: 2025.gebnlp-1.24
Volume:: Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Karolina Stańczak, Debora Nozza
Venues:: GeBNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 268–281
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.gebnlp-1.24/
DOI:
Bibkey:
Cite (ACL):: Noor Mairukh Khan Arnob, Saiyara Mahmud, and Azmine Toushik Wasi. 2025. Assessing Gender Bias of Pretrained Bangla Language Models in STEM and SHAPE Fields. In Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 268–281, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Assessing Gender Bias of Pretrained Bangla Language Models in STEM and SHAPE Fields (Arnob et al., GeBNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.gebnlp-1.24.pdf

PDF Cite Search Fix data