OSACT4 Shared Task on Offensive Language Detection: Intensive Preprocessing-Based Approach

Fatemah Husain


Abstract
The preprocessing phase is one of the key phases within the text classification pipeline. This study aims at investigating the impact of the preprocessing phase on text classification, specifically on offensive language and hate speech classification for Arabic text. The Arabic language used in social media is informal and written using Arabic dialects, which makes the text classification task very complex. Preprocessing helps in dimensionality reduction and removing useless content. We apply intensive preprocessing techniques to the dataset before processing it further and feeding it into the classification model. An intensive preprocessing-based approach demonstrates its significant impact on offensive language detection and hate speech detection shared tasks of the fourth workshop on Open-Source Arabic Corpora and Corpora Processing Tools (OSACT). Our team wins the third place (3rd) in the Sub-Task A Offensive Language Detection division and wins the first place (1st) in the Sub-Task B Hate Speech Detection division, with an F1 score of 89% and 95%, respectively, by providing the state-of-the-art performance in terms of F1, accuracy, recall, and precision for Arabic hate speech detection.
Anthology ID:
2020.osact-1.8
Volume:
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Hend Al-Khalifa, Walid Magdy, Kareem Darwish, Tamer Elsayed, Hamdy Mubarak
Venue:
OSACT
SIG:
Publisher:
European Language Resource Association
Note:
Pages:
53–60
Language:
English
URL:
https://aclanthology.org/2020.osact-1.8
DOI:
Bibkey:
Cite (ACL):
Fatemah Husain. 2020. OSACT4 Shared Task on Offensive Language Detection: Intensive Preprocessing-Based Approach. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pages 53–60, Marseille, France. European Language Resource Association.
Cite (Informal):
OSACT4 Shared Task on Offensive Language Detection: Intensive Preprocessing-Based Approach (Husain, OSACT 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2020.osact-1.8.pdf