Arabic Dialect Identification for Travel and Twitter Text

Pruthwik Mishra, Vandan Mujadia


Abstract
This paper presents the results of the experiments done as a part of MADAR Shared Task in WANLP 2019 on Arabic Fine-Grained Dialect Identification. Dialect Identification is one of the prominent tasks in the field of Natural language processing where the subsequent language modules can be improved based on it. We explored the use of different features like char, word n-gram, language model probabilities, etc on different classifiers. Results show that these features help to improve dialect classification accuracy. Results also show that traditional machine learning classifier tends to perform better when compared to neural network models on this task in a low resource setting.
Anthology ID:
W19-4628
Volume:
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:
August
Year:
2019
Address:
Florence, Italy
Editors:
Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
234–238
Language:
URL:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/W19-4628/
DOI:
10.18653/v1/W19-4628
Bibkey:
Cite (ACL):
Pruthwik Mishra and Vandan Mujadia. 2019. Arabic Dialect Identification for Travel and Twitter Text. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 234–238, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Arabic Dialect Identification for Travel and Twitter Text (Mishra & Mujadia, WANLP 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/sigedu-bea-out-of-sync-correction/W19-4628.pdf