MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)

Dhaou Ghoul; Gaël Lejeune

doi:10.18653/v1/W19-4627

MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge)

Abstract

We present MICHAEL, a simple lightweight method for automatic Arabic Dialect Identification on the MADAR travel domain Dialect Identification (DID). MICHAEL uses simple character-level features in order to perform a pre-processing free classification. More precisely, Character N-grams extracted from the original sentences are used to train a Multinomial Naive Bayes classifier. This system achieved an official score (accuracy) of 53.25% with 1<=N<=3 but showed a much better result with character 4-grams (62.17% accuracy).

Anthology ID:: W19-4627
Volume:: Proceedings of the Fourth Arabic Natural Language Processing Workshop
Month:: August
Year:: 2019
Address:: Florence, Italy
Editors:: Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:: WANLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 229–233
Language:
URL:: https://preview.aclanthology.org/iwcs-25-ingestion/W19-4627/
DOI:: 10.18653/v1/W19-4627
Bibkey:
Cite (ACL):: Dhaou Ghoul and Gaël Lejeune. 2019. MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge). In Proceedings of the Fourth Arabic Natural Language Processing Workshop, pages 229–233, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):: MICHAEL: Mining Character-level Patterns for Arabic Dialect Identification (MADAR Challenge) (Ghoul & Lejeune, WANLP 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/iwcs-25-ingestion/W19-4627.pdf

PDF Cite Search Fix data