Identifying Contextual Triggers in Hate Speech Texts Using Explainable Large Language Models

Dheeraj Kodati; Bhuvana Sree Lakkireddy

Identifying Contextual Triggers in Hate Speech Texts Using Explainable Large Language Models

Abstract

The pervasive spread of hate speech on online platforms poses a significant threat to social harmony, necessitating not only high-performing classifiers but also models capable of transparent, fine-grained interpretability. Existing methods often neglect the identification of influential contextual words that drive hate speech classification, limiting their reliability in high-stakes applications. To address this, we propose LLM-BiMACNet (Large Language Model-based Bidirectional Multi-Channel Attention Classification Network), an explainability-focused architecture that leverages pretrained language models and supervised attention to highlight key lexical indicators of hateful and offensive intent. Trained and evaluated on the HateXplain benchmark—comprising class labels, target community annotations, and human-labeled rationales—LLM-BiMACNet is optimized to simultaneously enhance both predictive performance and rationale alignment. Experimental results demonstrate that our model outperforms existing state-of-the-art approaches, achieving an accuracy of 87.3 %, AUROC of 0.881, token-level F1 of 0.553, IOU-F1 of 0.261, AUPRC of 0.874, and comprehensiveness of 0.524, thereby offering highly interpretable and accurate hate speech detection.

Anthology ID:: 2025.globalnlp-1.7
Volume:: Proceedings of the Workshop on Beyond English: Natural Language Processing for all Languages in an Era of Large Language Models
Month:: September
Year:: 2025
Address:: Varna, Bulgaria
Editors:: Sudhansu Bala Das, Pruthwik Mishra, Alok Singh, Shamsuddeen Hassan Muhammad, Asif Ekbal, Uday Kumar Das
Venues:: GlobalNLP | WS
SIG:
Publisher:: INCOMA Ltd., Shoumen, BULGARIA
Note:
Pages:: 51–58
Language:
URL:: https://preview.aclanthology.org/corrections-2026-01/2025.globalnlp-1.7/
DOI:
Bibkey:
Cite (ACL):: Dheeraj Kodati and Bhuvana Sree Lakkireddy. 2025. Identifying Contextual Triggers in Hate Speech Texts Using Explainable Large Language Models. In Proceedings of the Workshop on Beyond English: Natural Language Processing for all Languages in an Era of Large Language Models, pages 51–58, Varna, Bulgaria. INCOMA Ltd., Shoumen, BULGARIA.
Cite (Informal):: Identifying Contextual Triggers in Hate Speech Texts Using Explainable Large Language Models (Kodati & Lakkireddy, GlobalNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2026-01/2025.globalnlp-1.7.pdf

PDF Cite Search Fix data