PANDAS@Abusive Comment Detection in Tamil Code-Mixed Data Using Custom Embeddings with LaBSE
Krithika Swaminathan, Divyasri K, Gayathri G L, Thenmozhi Durairaj, Bharathi B
Abstract
Abusive language has lately been prevalent in comments on various social media platforms. The increasing hostility observed on the internet calls for the creation of a system that can identify and flag such acerbic content, to prevent conflict and mental distress. This task becomes more challenging when low-resource languages like Tamil, as well as the often-observed Tamil-English code-mixed text, are involved. The approach used in this paper for the classification model includes different methods of feature extraction and the use of traditional classifiers. We propose a novel method of combining language-agnostic sentence embeddings with the TF-IDF vector representation that uses a curated corpus of words as vocabulary, to create a custom embedding, which is then passed to an SVM classifier. Our experimentation yielded an accuracy of 52% and an F1-score of 0.54.- Anthology ID:
- 2022.dravidianlangtech-1.18
- Volume:
- Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Venue:
- DravidianLangTech
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 112–119
- Language:
- URL:
- https://aclanthology.org/2022.dravidianlangtech-1.18
- DOI:
- 10.18653/v1/2022.dravidianlangtech-1.18
- Cite (ACL):
- Krithika Swaminathan, Divyasri K, Gayathri G L, Thenmozhi Durairaj, and Bharathi B. 2022. PANDAS@Abusive Comment Detection in Tamil Code-Mixed Data Using Custom Embeddings with LaBSE. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 112–119, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- PANDAS@Abusive Comment Detection in Tamil Code-Mixed Data Using Custom Embeddings with LaBSE (Swaminathan et al., DravidianLangTech 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.dravidianlangtech-1.18.pdf