Automatic Identification of Arabic Dialects

Mohamed Belgacem, Georges Antoniadis, Laurent Besacier


Abstract
In this work, automatic recognition of Arabic dialects is proposed. An acoustic survey of the proportion of vocalic intervals and the standard deviation of consonantal intervals in nine dialects (Tunisia, Morocco, Algeria, Egypt, Syria, Lebanon, Yemen, Golf’s Countries and Iraq) is performed using the platform Alize and Gaussian Mixture Models (GMM). The results show the complexity of the automatic identification of Arabic dialects since. No clear border can be found between the dialects, but a gradual transition between them. They can even vary slightly from one city to another. The existence of this gradual change is easy to understand: it corresponds to a human and social reality, to the contact, friendships forged and affinity in the environment more or less immediate of the individual. This document also raises questions about the classes or macro classes of Arabic dialects noticed from the confusion matrix and the design of the hierarchical tree obtained.
Anthology ID:
L10-1497
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/719_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Mohamed Belgacem, Georges Antoniadis, and Laurent Besacier. 2010. Automatic Identification of Arabic Dialects. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Automatic Identification of Arabic Dialects (Belgacem et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/719_Paper.pdf