Abstract
In this paper, we describe our approach to utilize pre-trained BERT models with Convolutional Neural Networks for sub-task A of the Multilingual Offensive Language Identification shared task (OffensEval 2020), which is a part of the SemEval 2020. We show that combining CNN with BERT is better than using BERT on its own, and we emphasize the importance of utilizing pre-trained language models for downstream tasks. Our system, ranked 4th with macro averaged F1-Score of 0.897 in Arabic, 4th with score of 0.843 in Greek, and 3rd with score of 0.814 in Turkish. Additionally, we present ArabicBERT, a set of pre-trained transformer language models for Arabic that we share with the community.- Anthology ID:
- 2020.semeval-1.271
- Volume:
- Proceedings of the Fourteenth Workshop on Semantic Evaluation
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona (online)
- Editors:
- Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- International Committee for Computational Linguistics
- Note:
- Pages:
- 2054–2059
- Language:
- URL:
- https://aclanthology.org/2020.semeval-1.271
- DOI:
- 10.18653/v1/2020.semeval-1.271
- Cite (ACL):
- Ali Safaya, Moutasem Abdullatif, and Deniz Yuret. 2020. KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 2054–2059, Barcelona (online). International Committee for Computational Linguistics.
- Cite (Informal):
- KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech Identification in Social Media (Safaya et al., SemEval 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2020.semeval-1.271.pdf
- Code
- alisafaya/OffensEval2020