Radha N
2025
SSN_IT_NLP@DravidianLangTech 2025: Abusive Tamil and Malayalam Text targeting Women on Social Media
Maria Nancy C
|
Radha N
|
Swathika R
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The proliferation of social media platforms has resulted in increased instances of online abuse, particularly targeting marginalized groups such as women. This study focuses on the classification of abusive comments in Tamil and Malayalam, two Dravidian languages widely spoken in South India. Leveraging a multilingual BERT model, this paper provides an effective approach for detecting and categorizing abusive and non-abusive text. Using labeled datasets comprising social media comments, our model demonstrates its ability to identify targeted abuse with promising accuracy. This paper outlines the dataset preparation, model architecture, training methodology, and the evaluation of results, providing a foundation for combating online abuse in low-resource languages.The methodology is unique for its integration of multilingual BERT and weighted loss functions to address class imbalance, showcasing a pathway for effective abuse detection in other underrepresented languages. The BERT model achieved an F1-score of 0.6519 for Tamil and 0.6601 for Malayalam. The codefor this work is available on github Abusive-Text-targeting-women
Trio Innovators @ DravidianLangTech 2025: Multimodal Hate Speech Detection in Dravidian Languages
Radha N
|
Swathika R
|
Farha Afreen I
|
Annu G
|
Apoorva A
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
This paper presents an in-depth study on multimodal hate speech detection in Dravidian languages—Tamil, Telugu, and Malayalam—by leveraging both audio and text modalities. Detecting hate speech in these languages is particularly challenging due to factors such as codemixing, limited linguistic resources, and diverse cultural contexts. Our approach integrates advanced techniques for audio feature extraction and XLM-Roberta for text representation, with feature alignment and fusion to develop a robust multimodal framework. The dataset is carefully categorized into labeled classes: gender-based, political, religious, and personal defamation hate speech, along with a non-hate category. Experimental results indicate that our model achieves a macro F1-score of 0.76 and an accuracy of approximately 85.