Abstract
This paper describes the systems submitted by the Arabic Language Technology group (ALT) at SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media. We focus on sub-task A (Offensive Language Identification) for two languages: Arabic and English. Our efforts for both languages achieved more than 90% macro-averaged F1-score on the official test set. For Arabic, the best results were obtained by a system combination of Support Vector Machine, Deep Neural Network, and fine-tuned Bidirectional Encoder Representations from Transformers (BERT). For English, the best results were obtained by fine-tuning BERT.- Anthology ID:
- 2020.semeval-1.249
- Volume:
- Proceedings of the Fourteenth Workshop on Semantic Evaluation
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona (online)
- Editors:
- Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- International Committee for Computational Linguistics
- Note:
- Pages:
- 1891–1897
- Language:
- URL:
- https://aclanthology.org/2020.semeval-1.249
- DOI:
- 10.18653/v1/2020.semeval-1.249
- Cite (ACL):
- Sabit Hassan, Younes Samih, Hamdy Mubarak, and Ahmed Abdelali. 2020. ALT at SemEval-2020 Task 12: Arabic and English Offensive Language Identification in Social Media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1891–1897, Barcelona (online). International Committee for Computational Linguistics.
- Cite (Informal):
- ALT at SemEval-2020 Task 12: Arabic and English Offensive Language Identification in Social Media (Hassan et al., SemEval 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2020.semeval-1.249.pdf
- Data
- OLID, WikiDetox