indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Languages
Abstract
The paper aims to classify different offensive content types in 3 code-mixed Dravidian language datasets. The work leverages existing state of the art approaches in text classification by incorporating additional data and transfer learning on pre-trained models. Our final submission is an ensemble of an AWD-LSTM based model along with 2 different transformer model architectures based on BERT and RoBERTa. We achieved weighted-average F1 scores of 0.97, 0.77, and 0.72 in the Malayalam-English, Tamil-English, and Kannada-English datasets ranking 1st, 2nd, and 3rd on the respective shared-task leaderboards.- Anthology ID:
- 2021.dravidianlangtech-1.48
- Volume:
- Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
- Month:
- April
- Year:
- 2021
- Address:
- Kyiv
- Venue:
- DravidianLangTech
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 330–335
- Language:
- URL:
- https://aclanthology.org/2021.dravidianlangtech-1.48
- DOI:
- Cite (ACL):
- Kushal Kedia and Abhilash Nandy. 2021. indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Languages. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 330–335, Kyiv. Association for Computational Linguistics.
- Cite (Informal):
- indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Languages (Kedia & Nandy, DravidianLangTech 2021)
- PDF:
- https://preview.aclanthology.org/starsem-semeval-split/2021.dravidianlangtech-1.48.pdf
- Code
- kushal2000/Dravidian-Offensive-Language-Identification
- Data
- OLID