HNK@DravidianLangTech 2026: Investigating Grapheme-Level Normalization for Abusive Tamil Text Classification
Hanish Vigneshwar R, Nahul Alaguraj, Karthikeyan Manimaran, Ratnavel Rajalakshmi
Abstract
The increasing prevalence of social media has also correlated with an increase in abusive content targeting women, particularly for regional languages such as Tamil. The automatic identification of abusive content is critical for the creation of safer online spaces. In this paper, we focus on the abusive text detection of women in the context of binary text classification. We evaluated the performance of the proposed system on the abusive text detection of women using the IndicBERT, MuRIL, and Tamil-BERT models. Additionally, we propose the use of grapheme-aware normalization for the proposed system. Grapheme-aware normalization aims to maintain the structural integrity of Tamil characters at the Unicode level. The experimental results reveal that the proposed system using the Tamil-BERT model with grapheme-aware normalization achieves the best performance among the evaluated models. The proposed system achieved the third position in the shared task.- Anthology ID:
- 2026.dravidianlangtech-1.38
- Volume:
- Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
- Month:
- July
- Year:
- 2026
- Address:
- Underline (Virtual)
- Editors:
- Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
- Venues:
- DravidianLangTech | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 258–262
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.38/
- DOI:
- Cite (ACL):
- Hanish Vigneshwar R, Nahul Alaguraj, Karthikeyan Manimaran, and Ratnavel Rajalakshmi. 2026. HNK@DravidianLangTech 2026: Investigating Grapheme-Level Normalization for Abusive Tamil Text Classification. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 258–262, Underline (Virtual). Association for Computational Linguistics.
- Cite (Informal):
- HNK@DravidianLangTech 2026: Investigating Grapheme-Level Normalization for Abusive Tamil Text Classification (R et al., DravidianLangTech 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.38.pdf