Abstract
This paper presents the participation of Qatar University team in MADAR shared task, which addresses the problem of sentence-level fine-grained Arabic Dialect Identification over 25 different Arabic dialects in addition to the Modern Standard Arabic. Arabic Dialect Identification is not a trivial task since different dialects share some features, e.g., utilizing the same character set and some vocabularies. We opted to adopt a very simple approach in terms of extracted features and classification models; we only utilize word and character n-grams as features, and Na ̈ıve Bayes models as classifiers. Surprisingly, the simple approach achieved non-na ̈ıve performance. The official results, reported on a held-out testing set, show that the dialect of a given sentence can be identified at an accuracy of 64.58% by our best submitted run.- Anthology ID:
- W19-4624
- Volume:
- Proceedings of the Fourth Arabic Natural Language Processing Workshop
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Venue:
- WANLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 214–218
- Language:
- URL:
- https://aclanthology.org/W19-4624
- DOI:
- 10.18653/v1/W19-4624
- Cite (ACL):
- Sohaila Eltanbouly, May Bashendy, and Tamer Elsayed. 2019. Simple But Not Naïve: Fine-Grained Arabic Dialect Identification Using Only N-Grams. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 214–218, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Simple But Not Naïve: Fine-Grained Arabic Dialect Identification Using Only N-Grams (Eltanbouly et al., WANLP 2019)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W19-4624.pdf