Maria Nancy C

2025

pdf bib abs
SSN_IT_NLP@DravidianLangTech 2025: Abusive Tamil and Malayalam Text targeting Women on Social Media
Maria Nancy C | Radha N | Swathika R
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

The proliferation of social media platforms has resulted in increased instances of online abuse, particularly targeting marginalized groups such as women. This study focuses on the classification of abusive comments in Tamil and Malayalam, two Dravidian languages widely spoken in South India. Leveraging a multilingual BERT model, this paper provides an effective approach for detecting and categorizing abusive and non-abusive text. Using labeled datasets comprising social media comments, our model demonstrates its ability to identify targeted abuse with promising accuracy. This paper outlines the dataset preparation, model architecture, training methodology, and the evaluation of results, providing a foundation for combating online abuse in low-resource languages.The methodology is unique for its integration of multilingual BERT and weighted loss functions to address class imbalance, showcasing a pathway for effective abuse detection in other underrepresented languages. The BERT model achieved an F1-score of 0.6519 for Tamil and 0.6601 for Malayalam. The codefor this work is available on github Abusive-Text-targeting-women

Co-authors

Radha N 1
Swathika R 1

Venues

Fix data