Charangan Vasantharajan


2021

pdf
Hypers@DravidianLangTech-EACL2021: Offensive language identification in Dravidian code-mixed YouTube Comments and Posts
Charangan Vasantharajan | Uthayasanker Thayasivam
Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages

Code-Mixed Offensive contents are used pervasively in social media posts in the last few years. Consequently, gained the significant attraction of the research community for identifying the different forms of such content (e.g., hate speech, and sentiments) and contributed to the creation of datasets. Most of the recent studies deal with high-resource languages (e.g., English) due to many publicly available datasets, and by the lack of dataset in low-resource anguages, those studies are slightly involved in these languages. Therefore, this study has the focus on offensive language identification on code-mixed low-resourced Dravidian languages such as Tamil, Kannada, and Malayalam using the bidirectional approach and fine-tuning strategies. According to the leaderboard, the proposed model got a 0.96 F1-score for Malayalam, 0.73 F1-score for Tamil, and 0.70 F1-score for Kannada in the bench-mark. Moreover, in the view of multilingual models, this modal ranked 3rd and achieved favorable results and confirmed the model as the best among all systems submitted to these shared tasks in these three languages.