DE-ABUSE@TamilNLP-ACL 2022: Transliteration as Data Augmentation for Abuse Detection in Tamil

Vasanth Palanikumar, Sean Benhur, Adeep Hande, Bharathi Raja Chakravarthi


Abstract
With the rise of social media and internet, thereis a necessity to provide an inclusive space andprevent the abusive topics against any gender,race or community. This paper describes thesystem submitted to the ACL-2022 shared taskon fine-grained abuse detection in Tamil. In ourapproach we transliterated code-mixed datasetas an augmentation technique to increase thesize of the data. Using this method we wereable to rank 3rd on the task with a 0.290 macroaverage F1 score and a 0.590 weighted F1score
Anthology ID:
2022.dravidianlangtech-1.5
Volume:
Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
DravidianLangTech
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33–38
Language:
URL:
https://aclanthology.org/2022.dravidianlangtech-1.5
DOI:
10.18653/v1/2022.dravidianlangtech-1.5
Bibkey:
Cite (ACL):
Vasanth Palanikumar, Sean Benhur, Adeep Hande, and Bharathi Raja Chakravarthi. 2022. DE-ABUSE@TamilNLP-ACL 2022: Transliteration as Data Augmentation for Abuse Detection in Tamil. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages, pages 33–38, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
DE-ABUSE@TamilNLP-ACL 2022: Transliteration as Data Augmentation for Abuse Detection in Tamil (Palanikumar et al., DravidianLangTech 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.dravidianlangtech-1.5.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2022.dravidianlangtech-1.5.mp4