Abstract
This paper describes our solution submitted to shared task on Offensive Language Identification in Dravidian Languages. We participated in all three of offensive language identification. In order to address the task, we explored multilingual models based on XLM-RoBERTa and multilingual BERT trained on mixed data of three code-mixed languages. Besides, we solved the class-imbalance problem existed in training data by class combination, class weights and focal loss. Our model achieved weighted average F1 scores of 0.75 (ranked 4th), 0.94 (ranked 4th) and 0.72 (ranked 3rd) in Tamil-English task, Malayalam-English task and Kannada-English task, respectively.- Anthology ID:
- 2021.dravidianlangtech-1.21
- Volume:
- Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages
- Month:
- April
- Year:
- 2021
- Address:
- Kyiv
- Venue:
- DravidianLangTech
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 164–168
- Language:
- URL:
- https://aclanthology.org/2021.dravidianlangtech-1.21
- DOI:
- Cite (ACL):
- Zichao Li. 2021. Codewithzichao@DravidianLangTech-EACL2021: Exploring Multilingual Transformers for Offensive Language Identification on Code Mixing Text. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages, pages 164–168, Kyiv. Association for Computational Linguistics.
- Cite (Informal):
- Codewithzichao@DravidianLangTech-EACL2021: Exploring Multilingual Transformers for Offensive Language Identification on Code Mixing Text (Li, DravidianLangTech 2021)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2021.dravidianlangtech-1.21.pdf