Hanish Vigneshwar R
2026
HNK@DravidianLangTech 2026: Investigating Grapheme-Level Normalization for Abusive Tamil Text Classification
Hanish Vigneshwar R | Nahul Alaguraj | Karthikeyan Manimaran | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Hanish Vigneshwar R | Nahul Alaguraj | Karthikeyan Manimaran | Ratnavel Rajalakshmi
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The increasing prevalence of social media has also correlated with an increase in abusive content targeting women, particularly for regional languages such as Tamil. The automatic identification of abusive content is critical for the creation of safer online spaces. In this paper, we focus on the abusive text detection of women in the context of binary text classification. We evaluated the performance of the proposed system on the abusive text detection of women using the IndicBERT, MuRIL, and Tamil-BERT models. Additionally, we propose the use of grapheme-aware normalization for the proposed system. Grapheme-aware normalization aims to maintain the structural integrity of Tamil characters at the Unicode level. The experimental results reveal that the proposed system using the Tamil-BERT model with grapheme-aware normalization achieves the best performance among the evaluated models. The proposed system achieved the third position in the shared task.