Abstract
This paper describes the models submitted by the team MUCS for Offensive Language Identification in Dravidian Languages-EACL 2021 shared task that aims at identifying and classifying code-mixed texts of three language pairs namely, Kannada-English (Kn-En), Malayalam-English (Ma-En), and Tamil-English (Ta-En) into six predefined categories (5 categories in Ma-En language pair). Two models, namely, COOLI-Ensemble and COOLI-Keras are trained with the char sequences extracted from the sentences combined with words as features. Out of the two proposed models, COOLI-Ensemble model (best among our models) obtained first rank for Ma-En language pair with 0.97 weighted F1-score and fourth and sixth ranks with 0.75 and 0.69 weighted F1-score for Ta-En and Kn-En language pairs respectively.- Anthology ID:
- 2021.dravidianlangtech-1.47
- Volume:
- Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
- Month:
- April
- Year:
- 2021
- Address:
- Kyiv
- Editors:
- Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar M, Parameswari Krishnamurthy, Elizabeth Sherly
- Venue:
- DravidianLangTech
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 323–329
- Language:
- URL:
- https://aclanthology.org/2021.dravidianlangtech-1.47
- DOI:
- Cite (ACL):
- Fazlourrahman Balouchzahi, Aparna B K, and H L Shashirekha. 2021. MUCS@DravidianLangTech-EACL2021:COOLI-Code-Mixing Offensive Language Identification. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 323–329, Kyiv. Association for Computational Linguistics.
- Cite (Informal):
- MUCS@DravidianLangTech-EACL2021:COOLI-Code-Mixing Offensive Language Identification (Balouchzahi et al., DravidianLangTech 2021)
- PDF:
- https://preview.aclanthology.org/landing_page/2021.dravidianlangtech-1.47.pdf