Abstract
Detecting offensive language in under-resourced languages presents a significant real-world challenge for social media platforms. This paper is the first work focused on the issue of offensive language detection in Arabizi, an under-explored topic in an under-resourced form of Arabic. For the first time, a comprehensive and critical overview of the existing work on the topic is presented. In addition, we carry out experiments using different BERT-like models and show the feasibility of detecting offensive language in Arabizi with high accuracy. Throughout a thorough analysis of results, we emphasize the complexities introduced by dialect variations and out-of-domain generalization. We use in our experiments a dataset that we have constructed by leveraging existing, albeit limited, resources. To facilitate further research, we make this dataset publicly accessible to the research community.- Anthology ID:
- 2023.arabicnlp-1.36
- Volume:
- Proceedings of ArabicNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore (Hybrid)
- Editors:
- Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
- Venues:
- ArabicNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 423–434
- Language:
- URL:
- https://aclanthology.org/2023.arabicnlp-1.36
- DOI:
- 10.18653/v1/2023.arabicnlp-1.36
- Cite (ACL):
- Imene Bensalem, Meryem Mout, and Paolo Rosso. 2023. Offensive Language Detection in Arabizi. In Proceedings of ArabicNLP 2023, pages 423–434, Singapore (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- Offensive Language Detection in Arabizi (Bensalem et al., ArabicNLP-WS 2023)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2023.arabicnlp-1.36.pdf