Abstract
In this paper, we present a description of our experiments on country-level Arabic dialect identification. A comparison study between a set of classifiers has been carried out. The best results were achieved using the Linear Support Vector Classification (LSVC) model by applying a Random Over Sampling (ROS) process yielding an F1-score of 18.74% in the post-evaluation phase. In the evaluation phase, our best submitted system has achieved an F1-score of 18.27%, very close to the average F1-score (18.80%) obtained for all the submitted systems.- Anthology ID:
- 2020.wanlp-1.24
- Volume:
- Proceedings of the Fifth Arabic Natural Language Processing Workshop
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Imed Zitouni, Muhammad Abdul-Mageed, Houda Bouamor, Fethi Bougares, Mahmoud El-Haj, Nadi Tomeh, Wajdi Zaghouani
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 250–256
- Language:
- URL:
- https://aclanthology.org/2020.wanlp-1.24
- DOI:
- Cite (ACL):
- Mohamed Lichouri and Mourad Abbas. 2020. Simple vs Oversampling-based Classification Methods for Fine Grained Arabic Dialect Identification in Twitter. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 250–256, Barcelona, Spain (Online). Association for Computational Linguistics.
- Cite (Informal):
- Simple vs Oversampling-based Classification Methods for Fine Grained Arabic Dialect Identification in Twitter (Lichouri & Abbas, WANLP 2020)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2020.wanlp-1.24.pdf