Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification
Bashar Talafha, Wael Farhan, Ahmed Altakrouri, Hussein Al-Natsheh
Abstract
Arabic dialect identification is an inherently complex problem, as Arabic dialect taxonomy is convoluted and aims to dissect a continuous space rather than a discrete one. In this work, we present machine and deep learning approaches to predict 21 fine-grained dialects form a set of given tweets per user. We adopted numerous feature extraction methods most of which showed improvement in the final model, such as word embedding, Tf-idf, and other tweet features. Our results show that a simple LinearSVC can outperform any complex deep learning model given a set of curated features. With a relatively complex user voting mechanism, we were able to achieve a Macro-Averaged F1-score of 71.84% on MADAR shared subtask-2. Our best submitted model ranked second out of all participating teams.- Anthology ID:
- W19-4629
- Volume:
- Proceedings of the Fourth Arabic Natural Language Processing Workshop
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 239–243
- Language:
- URL:
- https://preview.aclanthology.org/ingest_wac_2008/W19-4629/
- DOI:
- 10.18653/v1/W19-4629
- Cite (ACL):
- Bashar Talafha, Wael Farhan, Ahmed Altakrouri, and Hussein Al-Natsheh. 2019. Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 239–243, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Mawdoo3 AI at MADAR Shared Task: Arabic Tweet Dialect Identification (Talafha et al., WANLP 2019)
- PDF:
- https://preview.aclanthology.org/ingest_wac_2008/W19-4629.pdf