Abstract
The Arabic Online Commentary (AOC) (Zaidan and Callison-Burch, 2011) is a large-scale repos-itory of Arabic dialects with manual labels for4varieties of the language. Existing dialect iden-tification models exploiting the dataset pre-date the recent boost deep learning brought to NLPand hence the data are not benchmarked for use with deep learning, nor is it clear how much neural networks can help tease the categories in the data apart. We treat these two limitations:We (1) benchmark the data, and (2) empirically test6different deep learning methods on thetask, comparing peformance to several classical machine learning models under different condi-tions (i.e., both binary and multi-way classification). Our experimental results show that variantsof (attention-based) bidirectional recurrent neural networks achieve best accuracy (acc) on thetask, significantly outperforming all competitive baselines. On blind test data, our models reach87.65%acc on the binary task (MSA vs. dialects),87.4%acc on the 3-way dialect task (Egyptianvs. Gulf vs. Levantine), and82.45%acc on the 4-way variants task (MSA vs. Egyptian vs. Gulfvs. Levantine). We release our benchmark for future work on the dataset- Anthology ID:
- W18-3930
- Volume:
- Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 263–274
- Language:
- URL:
- https://aclanthology.org/W18-3930
- DOI:
- Cite (ACL):
- Mohamed Elaraby and Muhammad Abdul-Mageed. 2018. Deep Models for Arabic Dialect Identification on Benchmarked Data. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 263–274, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Deep Models for Arabic Dialect Identification on Benchmarked Data (Elaraby & Abdul-Mageed, VarDial 2018)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/W18-3930.pdf