MUCS@DravidianLangTech2023: Leveraging Learning Models to Identify Abusive Comments in Code-mixed Dravidian Languages

Asha Hegde, Kavya G, Sharal Coelho, Hosahalli Lakshmaiah Shashirekha


Abstract
Abusive language detection in user-generated online content has become a pressing concern due to its negative impact on users and challenges for policy makers. Online platforms are faced with the task of moderating abusive content to mitigate societal harm, adhere to legal requirements, and foster inclusivity. Despite numerous methods developed for automated detection of abusive language, the problem continues to persist. This ongoing challenge necessitates further research and development to enhance the effectiveness of abusive content detection systems and implement proactive measures to create safer and more respectful online spaces. To address the automatic detection of abusive languages in social media platforms, this paper describes the models submitted by our team - MUCS to the shared task “Abusive Comment Detection in Tamil and Telugu” at DravidianLangTech - in Recent Advances in Natural Language Processing (RANLP) 2023. This shared task addresses the abusive comment detection in code-mixed Tamil, Telugu, and romanized Tamil (Tamil-English) texts. Two distinct models: i) AbusiveML - a model implemented utilizing Linear Support Vector Classifier (LinearSVC) algorithm fed with n-grams of words and character sequences within word boundary (char_wb) features and ii) AbusiveTL - a Transfer Learning (TL ) model with three different Bidirectional Encoder Representations from Transformers (BERT) models along with random oversampling to deal with data imbalance, are submitted to the shared task for detecting abusive language in the given code-mixed texts. The AbusiveTL model fared well among these two models, with macro F1 scores of 0.46, 0.74, and 0.49 for code-mixed Tamil, Telugu, and Tamil-English texts respectively.
Anthology ID:
2023.dravidianlangtech-1.39
Volume:
Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Bharathi R. Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Sajeetha Thavareesan, Elizabeth Sherly
Venues:
DravidianLangTech | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
266–274
Language:
URL:
https://aclanthology.org/2023.dravidianlangtech-1.39
DOI:
Bibkey:
Cite (ACL):
Asha Hegde, Kavya G, Sharal Coelho, and Hosahalli Lakshmaiah Shashirekha. 2023. MUCS@DravidianLangTech2023: Leveraging Learning Models to Identify Abusive Comments in Code-mixed Dravidian Languages. In Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages, pages 266–274, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
MUCS@DravidianLangTech2023: Leveraging Learning Models to Identify Abusive Comments in Code-mixed Dravidian Languages (Hegde et al., DravidianLangTech-WS 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2023.dravidianlangtech-1.39.pdf