Heisenberg at BLP-2025 Task 1: Bangla Hate Speech Classification using Pretrained Language Models and Data Augmentation

Samin Yasir


Abstract
Detecting hate speech in Bangla is challenging due to its complex vocabulary, spelling variations, and region-specific word usage. However, effective detection is essential to ensure safer social media spaces and to take appropriate action against perpetrators. In this study, we report our participation in Subtask A of Task 1: Bangla Hate Speech Detection (Hasan et al., 2025b). In addition to the provided 50K Bangla comments (Hasan et al., 2025a), we collected approximately 4K Bangla comments and employed several data augmentation techniques. We evaluated several transformer-based models (e.g., BanglaBERT, BanglaT5, BanglaHateBERT), achieving the best performance with a micro-F1 score of 71% and securing 18th place in the Evaluation Phase.
Anthology ID:
2025.banglalp-1.42
Volume:
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Naeemul Hassan, Enamul Hoque Prince, Mohiuddin Tasnim, Md Rashad Al Hasan Rony, Md Tahmid Rahman Rahman
Venues:
BanglaLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
476–484
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.banglalp-1.42/
DOI:
Bibkey:
Cite (ACL):
Samin Yasir. 2025. Heisenberg at BLP-2025 Task 1: Bangla Hate Speech Classification using Pretrained Language Models and Data Augmentation. In Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025), pages 476–484, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Heisenberg at BLP-2025 Task 1: Bangla Hate Speech Classification using Pretrained Language Models and Data Augmentation (Yasir, BanglaLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.banglalp-1.42.pdf