Abstract
In this paper, we describe our participation in the NADI2023 shared task for the classification of Arabic dialects in tweets. For training, evaluation, and testing purposes, a primary dataset comprising tweets from 18 Arab countries is provided, along with three older datasets. The main objective is to develop a model capable of classifying tweets from these 18 countries. We outline our approach, which leverages various machine learning models. Our experiments demonstrate that large language models, particularly Arabertv2-Large, Arabertv2-Base, and CAMeLBERT-Mix DID MADAR, consistently outperform traditional methods such as SVM, XGBOOST, Multinomial Naive Bayes, AdaBoost, and Random Forests.- Anthology ID:
- 2023.arabicnlp-1.72
- Volume:
- Proceedings of ArabicNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore (Hybrid)
- Editors:
- Hassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Ahmed Abdelali, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Khalil Mrini, Rawan Almatham
- Venues:
- ArabicNLP | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 665–669
- Language:
- URL:
- https://aclanthology.org/2023.arabicnlp-1.72
- DOI:
- 10.18653/v1/2023.arabicnlp-1.72
- Cite (ACL):
- Yash Hatekar and Muhammad Abdo. 2023. IUNADI at NADI 2023 shared task: Country-level Arabic Dialect Classification in Tweets for the Shared Task NADI 2023. In Proceedings of ArabicNLP 2023, pages 665–669, Singapore (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- IUNADI at NADI 2023 shared task: Country-level Arabic Dialect Classification in Tweets for the Shared Task NADI 2023 (Hatekar & Abdo, ArabicNLP-WS 2023)
- PDF:
- https://preview.aclanthology.org/revert-3132-ingestion-checklist/2023.arabicnlp-1.72.pdf