Abstract
Arabic has a wide range of dialects. Dialect is the language variation of a specific community. In this paper, we show the models we created to participate in the third Nuanced Arabic Dialect Identification (NADI) shared task (Subtask 1) that involves developing a system to classify a tweet into a country-level dialect. We utilized a number of machine learning techniques as well as deep learning transformer-based models. For the machine learning approach, we build an ensemble classifier of various machine learning models. In our deep learning approach, we consider bidirectional LSTM model and AraBERT pretrained model. The results demonstrate that the deep learning approach performs noticeably better than the other machine learning approaches with 68.7% accuracy on the development set.- Anthology ID:
- 2022.wanlp-1.50
- Volume:
- Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Editors:
- Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 464–467
- Language:
- URL:
- https://aclanthology.org/2022.wanlp-1.50
- DOI:
- 10.18653/v1/2022.wanlp-1.50
- Cite (ACL):
- Nouf AlShenaifi and Aqil Azmi. 2022. Arabic dialect identification using machine learning and transformer-based models: Submission to the NADI 2022 Shared Task. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 464–467, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- Arabic dialect identification using machine learning and transformer-based models: Submission to the NADI 2022 Shared Task (AlShenaifi & Azmi, WANLP 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.wanlp-1.50.pdf