Assessing Gender Bias of Pretrained Bangla Language Models in STEM and SHAPE Fields
Noor Mairukh Khan Arnob, Saiyara Mahmud, Azmine Toushik Wasi
Abstract
Gender bias continues to shape societal perceptions across both STEM (Science, Technology, Engineering, and Mathematics) and SHAPE (Social Sciences, Humanities, and the Arts for People and the Economy) domains. While existing studies have explored such biases in English language models, similar analyses in Bangla—spoken by over 240 million people—remain scarce. In this work, we investigate gender-profession associations in Bangla language models. We introduce Pokkhopat, a curated dataset of gendered terms and profession-related words across STEM and SHAPE disciplines. Using a suite of embedding-based bias detection methods—including WEAT, ECT, RND, RIPA, and cosine similarity visualizations—we evaluate 11 Bangla language models. Our findings show that several widely-used open-source Bangla NLP models (e.g., sagorsarker/bangla-bert-base) exhibit significant gender bias, underscoring the need for more inclusive and bias-aware development in low-resource languages like Bangla. We also find that many STEM and SHAPE-related words are absent from these models’ vocabularies, complicating bias detection and possibly amplifying existing biases. This emphasizes the importance of incorporating more diverse and comprehensive training data to mitigate such biases moving forward. Code available at https://github.com/HerWILL-Inc/ACL-2025/.- Anthology ID:
- 2025.gebnlp-1.24
- Volume:
- Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP)
- Month:
- August
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Agnieszka Faleńska, Christine Basta, Marta Costa-jussà, Karolina Stańczak, Debora Nozza
- Venues:
- GeBNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 268–281
- Language:
- URL:
- https://preview.aclanthology.org/display_plenaries/2025.gebnlp-1.24/
- DOI:
- Cite (ACL):
- Noor Mairukh Khan Arnob, Saiyara Mahmud, and Azmine Toushik Wasi. 2025. Assessing Gender Bias of Pretrained Bangla Language Models in STEM and SHAPE Fields. In Proceedings of the 6th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 268–281, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Assessing Gender Bias of Pretrained Bangla Language Models in STEM and SHAPE Fields (Arnob et al., GeBNLP 2025)
- PDF:
- https://preview.aclanthology.org/display_plenaries/2025.gebnlp-1.24.pdf