Building an Ensemble of Transformer Models for Arabic Dialect Classification and Sentiment Analysis
Abdullah Salem Khered, Ingy Yasser Hassan Abdou Abdelhalim, Riza Batista-Navarro
Abstract
In this paper, we describe the approaches we developed for the Nuanced Arabic Dialect Identification (NADI) 2022 shared task, which consists of two subtasks: the identification of country-level Arabic dialects and sentiment analysis. Our team, UniManc, developed approaches to the two subtasks which are underpinned by the same model: a pre-trained MARBERT language model. For Subtask 1, we applied undersampling to create versions of the training data with a balanced distribution across classes. For Subtask 2, we further trained the original MARBERT model for the masked language modelling objective using a NADI-provided dataset of unlabelled Arabic tweets. For each of the subtasks, a MARBERT model was fine-tuned for sequence classification, using different values for hyperparameters such as seed and learning rate. This resulted in multiple model variants, which formed the basis of an ensemble model for each subtask. Based on the official NADI evaluation, our ensemble model obtained a macro-F1-score of 26.863, ranking second overall in the first subtask. In the second subtask, our ensemble model also ranked second, obtaining a macro-F1-PN score (macro-averaged F1-score over the Positive and Negative classes) of 73.544.- Anthology ID:
- 2022.wanlp-1.53
- Volume:
- Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Editors:
- Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 479–484
- Language:
- URL:
- https://aclanthology.org/2022.wanlp-1.53
- DOI:
- 10.18653/v1/2022.wanlp-1.53
- Cite (ACL):
- Abdullah Salem Khered, Ingy Yasser Hassan Abdou Abdelhalim, and Riza Batista-Navarro. 2022. Building an Ensemble of Transformer Models for Arabic Dialect Classification and Sentiment Analysis. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 479–484, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- Building an Ensemble of Transformer Models for Arabic Dialect Classification and Sentiment Analysis (Khered et al., WANLP 2022)
- PDF:
- https://preview.aclanthology.org/landing_page/2022.wanlp-1.53.pdf