Mainul Haque


pdf bib
BanglaHateBERT: BERT for Abusive Language Detection in Bengali
Md Saroar Jahan | Mainul Haque | Nabil Arhab | Mourad Oussalah
Proceedings of the Second International Workshop on Resources and Techniques for User Information in Abusive Language Analysis

This paper introduces BanglaHateBERT, a retrained BERT model for abusive language detection in Bengali. The model was trained with a large-scale Bengali offensive, abusive, and hateful corpus that we have collected from different sources and made available to the public. Furthermore, we have collected and manually annotated 15K Bengali hate speech balanced dataset and made it publicly available for the research community. We used existing pre-trained BanglaBERT model and retrained it with 1.5 million offensive posts. We presented the results of a detailed comparison between generic pre-trained language model and retrained with the abuse-inclined version. In all datasets, BanglaHateBERT outperformed the corresponding available BERT model.