The QMUL/HRBDT contribution to the NADI Arabic Dialect Identification Shared Task

Abdulrahman Aloraini, Massimo Poesio, Ayman Alhelbawy


Abstract
We present the Arabic dialect identification system that we used for the country-level subtask of the NADI challenge. Our model consists of three components: BiLSTM-CNN, character-level TF-IDF, and topic modeling features. We represent each tweet using these features and feed them into a deep neural network. We then add an effective heuristic that improves the overall performance. We achieved an F1-Macro score of 20.77% and an accuracy of 34.32% on the test set. The model was also evaluated on the Arabic Online Commentary dataset, achieving results better than the state-of-the-art.
Anthology ID:
2020.wanlp-1.31
Volume:
Proceedings of the Fifth Arabic Natural Language Processing Workshop
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venues:
COLING | WANLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
295–301
Language:
URL:
https://aclanthology.org/2020.wanlp-1.31
DOI:
Bibkey:
Cite (ACL):
Abdulrahman Aloraini, Massimo Poesio, and Ayman Alhelbawy. 2020. The QMUL/HRBDT contribution to the NADI Arabic Dialect Identification Shared Task. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, pages 295–301, Barcelona, Spain (Online). Association for Computational Linguistics.
Cite (Informal):
The QMUL/HRBDT contribution to the NADI Arabic Dialect Identification Shared Task (Aloraini et al., WANLP 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.wanlp-1.31.pdf