Hani Elgabou


2017

pdf bib
Building Dialectal Arabic Corpora
Hani Elgabou | Dimitar Kazakov
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology

The aim of this research is to identify local Arabic dialects in texts from social media (Twitter) and link them to specific geographic areas. Dialect identification is studied as a subset of the task of language identification. The proposed method is based on unsupervised learning using simultaneously lexical and geographic distance. While this study focusses on Libyan dialects, the approach is general, and could produce resources to support human translators and interpreters when dealing with vernaculars rather than standard Arabic.
Search
Co-authors
Venues