Abstract
This paper presents the results of the experiments done as a part of MADAR Shared Task in WANLP 2019 on Arabic Fine-Grained Dialect Identification. Dialect Identification is one of the prominent tasks in the field of Natural language processing where the subsequent language modules can be improved based on it. We explored the use of different features like char, word n-gram, language model probabilities, etc on different classifiers. Results show that these features help to improve dialect classification accuracy. Results also show that traditional machine learning classifier tends to perform better when compared to neural network models on this task in a low resource setting.- Anthology ID:
- W19-4628
- Volume:
- Proceedings of the Fourth Arabic Natural Language Processing Workshop
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 234–238
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/W19-4628/
- DOI:
- 10.18653/v1/W19-4628
- Cite (ACL):
- Pruthwik Mishra and Vandan Mujadia. 2019. Arabic Dialect Identification for Travel and Twitter Text. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 234–238, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Arabic Dialect Identification for Travel and Twitter Text (Mishra & Mujadia, WANLP 2019)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/W19-4628.pdf