Abstract
This paper describes our systems submitted to the Second Nuanced Arabic Dialect Identification Shared Task (NADI 2021). Dialect identification is the task of automatically detecting the source variety of a given text or speech segment. There are four subtasks, two subtasks for country-level identification and the other two subtasks for province-level identification. The data in this task covers a total of 100 provinces from all 21 Arab countries and come from the Twitter domain. The proposed systems depend on five machine-learning approaches namely Complement Naïve Bayes, Support Vector Machine, Decision Tree, Logistic Regression and Random Forest Classifiers. F1 macro-averaged score of Naïve Bayes classifier outperformed all other classifiers for development and test data.- Anthology ID:
- 2021.wanlp-1.34
- Volume:
- Proceedings of the Sixth Arabic Natural Language Processing Workshop
- Month:
- April
- Year:
- 2021
- Address:
- Kyiv, Ukraine (Virtual)
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 287–290
- Language:
- URL:
- https://aclanthology.org/2021.wanlp-1.34
- DOI:
- Cite (ACL):
- Hamada Nayel, Ahmed Hassan, Mahmoud Sobhi, and Ahmed El-Sawy. 2021. Machine Learning-Based Approach for Arabic Dialect Identification. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 287–290, Kyiv, Ukraine (Virtual). Association for Computational Linguistics.
- Cite (Informal):
- Machine Learning-Based Approach for Arabic Dialect Identification (Nayel et al., WANLP 2021)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2021.wanlp-1.34.pdf