A Neural Architecture for Dialectal Arabic Segmentation

Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, Kareem Darwish


Abstract
The automated processing of Arabic Dialects is challenging due to the lack of spelling standards and to the scarcity of annotated data and resources in general. Segmentation of words into its constituent parts is an important processing building block. In this paper, we show how a segmenter can be trained using only 350 annotated tweets using neural networks without any normalization or use of lexical features or lexical resources. We deal with segmentation as a sequence labeling problem at the character level. We show experimentally that our model can rival state-of-the-art methods that rely on additional resources.
Anthology ID:
W17-1306
Volume:
Proceedings of the Third Arabic Natural Language Processing Workshop
Month:
April
Year:
2017
Address:
Valencia, Spain
Editors:
Nizar Habash, Mona Diab, Kareem Darwish, Wassim El-Hajj, Hend Al-Khalifa, Houda Bouamor, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani
Venue:
WANLP
SIG:
SEMITIC
Publisher:
Association for Computational Linguistics
Note:
Pages:
46–54
Language:
URL:
https://aclanthology.org/W17-1306
DOI:
10.18653/v1/W17-1306
Bibkey:
Cite (ACL):
Younes Samih, Mohammed Attia, Mohamed Eldesouki, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer, and Kareem Darwish. 2017. A Neural Architecture for Dialectal Arabic Segmentation. In Proceedings of the Third Arabic Natural Language Processing Workshop, pages 46–54, Valencia, Spain. Association for Computational Linguistics.
Cite (Informal):
A Neural Architecture for Dialectal Arabic Segmentation (Samih et al., WANLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/W17-1306.pdf
Data
Egyptian Arabic Segmentation Dataset