ALT at SemEval-2020 Task 12: Arabic and English Offensive Language Identification in Social Media

Sabit Hassan, Younes Samih, Hamdy Mubarak, Ahmed Abdelali


Abstract
This paper describes the systems submitted by the Arabic Language Technology group (ALT) at SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media. We focus on sub-task A (Offensive Language Identification) for two languages: Arabic and English. Our efforts for both languages achieved more than 90% macro-averaged F1-score on the official test set. For Arabic, the best results were obtained by a system combination of Support Vector Machine, Deep Neural Network, and fine-tuned Bidirectional Encoder Representations from Transformers (BERT). For English, the best results were obtained by fine-tuning BERT.
Anthology ID:
2020.semeval-1.249
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Editors:
Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Venue:
SemEval
SIG:
SIGLEX
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
1891–1897
Language:
URL:
https://aclanthology.org/2020.semeval-1.249
DOI:
10.18653/v1/2020.semeval-1.249
Bibkey:
Cite (ACL):
Sabit Hassan, Younes Samih, Hamdy Mubarak, and Ahmed Abdelali. 2020. ALT at SemEval-2020 Task 12: Arabic and English Offensive Language Identification in Social Media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1891–1897, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):
ALT at SemEval-2020 Task 12: Arabic and English Offensive Language Identification in Social Media (Hassan et al., SemEval 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2020.semeval-1.249.pdf
Data
OLIDWikiDetox