CSSCUTN@DravidianLangTech:Abusive comments Detection in Tamil and Telugu
Kathiravan Pannerselvam, Saranya Rajiakodi, Rahul Ponnusamy, Sajeetha Thavareesan
Abstract
Code-mixing is a word or phrase-level act of interchanging two or more languages during a conversation or in written text within a sentence. This phenomenon is widespread on social media platforms, and understanding the underlying abusive comments in a code-mixed sentence is a complex challenge. We present our system in our submission for the DravidianLangTech Shared Task on Abusive Comment Detection in Tamil and Telugu. Our approach involves building a multiclass abusive detection model that recognizes 8 different labels. The provided samples are code-mixed Tamil-English text, where Tamil is represented in romanised form. We focused on the Multiclass classification subtask, and we leveraged Support Vector Machine (SVM), Random Forest (RF), and Logistic Regression (LR). Our method exhibited its effectiveness in the shared task by earning the ninth rank out of all competing systems for the classification of abusive comments in the code-mixed text. Our proposed classifier achieves an impressive accuracy of 0.99 and an F1-score of 0.99 for a balanced dataset using TF-IDF with SVM. It can be used effectively to detect abusive comments in Tamil, English code-mixed text- Anthology ID:
- 2023.dravidianlangtech-1.45
- Volume:
- Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages
- Month:
- September
- Year:
- 2023
- Address:
- Varna, Bulgaria
- Editors:
- Bharathi R. Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Sajeetha Thavareesan, Elizabeth Sherly
- Venues:
- DravidianLangTech | WS
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 306–312
- Language:
- URL:
- https://aclanthology.org/2023.dravidianlangtech-1.45
- DOI:
- Cite (ACL):
- Kathiravan Pannerselvam, Saranya Rajiakodi, Rahul Ponnusamy, and Sajeetha Thavareesan. 2023. CSSCUTN@DravidianLangTech:Abusive comments Detection in Tamil and Telugu. In Proceedings of the Third Workshop on Speech and Language Technologies for Dravidian Languages, pages 306–312, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- CSSCUTN@DravidianLangTech:Abusive comments Detection in Tamil and Telugu (Pannerselvam et al., DravidianLangTech-WS 2023)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2023.dravidianlangtech-1.45.pdf