CUET_Sntx_Srfrs at BLP-2025 Task 1: Combining Hierarchical Classification and Ensemble Learning for Bengali Hate Speech Detection
Hafsa Hoque Tripty, Laiba Tabassum, Hasan Mesbaul Ali Taher, Kawsar Ahmed, Mohammed Moshiul Hoque
Abstract
Detecting hate speech in Bengali social media content presents considerable challenges, primarily due to the prevalence of informal language and the limited availability of annotated datasets. This study investigates the identification of hate speech in Bengali YouTube comments, focusing on classifying the type, severity, and target group. Multiple machine learning baselines and voting ensemble techniques are evaluated to address these tasks. The methodology involves text preprocessing, feature extraction using TF-IDF and Count vectors, and aggregating predictions from several models. Hierarchical classification with TF-IDF features and majority voting improves the detection of less frequent hate speech categories while maintaining robust overall performance, resulting in an 18th place ranking and a micro F1 score of 68.42%. Furthermore, ablation studies assess the impact of preprocessing steps and n-gram selection, providing reproducible baselines for Bengali hate speech detection. All codes and resources are publicly available at https://github.com/Hasan-Mesbaul-Ali-Taher/BLP_25_Task_1- Anthology ID:
- 2025.banglalp-1.48
- Volume:
- Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
- Month:
- December
- Year:
- 2025
- Address:
- Mumbai, India
- Editors:
- Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Naeemul Hassan, Enamul Hoque Prince, Mohiuddin Tasnim, Md Rashad Al Hasan Rony, Md Tahmid Rahman Rahman
- Venues:
- BanglaLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 523–530
- Language:
- URL:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.banglalp-1.48/
- DOI:
- Cite (ACL):
- Hafsa Hoque Tripty, Laiba Tabassum, Hasan Mesbaul Ali Taher, Kawsar Ahmed, and Mohammed Moshiul Hoque. 2025. CUET_Sntx_Srfrs at BLP-2025 Task 1: Combining Hierarchical Classification and Ensemble Learning for Bengali Hate Speech Detection. In Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025), pages 523–530, Mumbai, India. Association for Computational Linguistics.
- Cite (Informal):
- CUET_Sntx_Srfrs at BLP-2025 Task 1: Combining Hierarchical Classification and Ensemble Learning for Bengali Hate Speech Detection (Tripty et al., BanglaLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.banglalp-1.48.pdf