TAC at SemEval-2020 Task 12: Ensembling Approach for Multilingual Offensive Language Identification in Social Media

Talha Anwar, Omer Baig


Abstract
Usage of offensive language on social media is getting more common these days, and there is a need of a mechanism to detect it and control it. This paper deals with offensive language detection in five different languages; English, Arabic, Danish, Greek and Turkish. We presented an almost similar ensemble pipeline comprised of machine learning and deep learning models for all five languages. Three machine learning and four deep learning models were used in the ensemble. In the OffensEval-2020 competition our model achieved F1-score of 0.85, 0.74, 0.68, 0.81, and 0.9 for Arabic, Turkish, Danish, Greek and English language tasks respectively.
Anthology ID:
2020.semeval-1.289
Volume:
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Month:
December
Year:
2020
Address:
Barcelona (online)
Editors:
Aurelie Herbelot, Xiaodan Zhu, Alexis Palmer, Nathan Schneider, Jonathan May, Ekaterina Shutova
Venue:
SemEval
SIG:
SIGLEX
Publisher:
International Committee for Computational Linguistics
Note:
Pages:
2177–2182
Language:
URL:
https://aclanthology.org/2020.semeval-1.289
DOI:
10.18653/v1/2020.semeval-1.289
Bibkey:
Cite (ACL):
Talha Anwar and Omer Baig. 2020. TAC at SemEval-2020 Task 12: Ensembling Approach for Multilingual Offensive Language Identification in Social Media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 2177–2182, Barcelona (online). International Committee for Computational Linguistics.
Cite (Informal):
TAC at SemEval-2020 Task 12: Ensembling Approach for Multilingual Offensive Language Identification in Social Media (Anwar & Baig, SemEval 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/2020.semeval-1.289.pdf
Code
 talhaanwarch/offenseeval2020
Data
OLID