Injy Hamed


2020

pdf
Cairo Student Code-Switch (CSCS) Corpus: An Annotated Egyptian Arabic-English Corpus
Mohamed Balabel | Injy Hamed | Slim Abdennadher | Ngoc Thang Vu | Özlem Çetinoğlu
Proceedings of the Twelfth Language Resources and Evaluation Conference

Code-switching has become a prevalent phenomenon across many communities. It poses a challenge to NLP researchers, mainly due to the lack of available data needed for training and testing applications. In this paper, we introduce a new resource: a corpus of Egyptian- Arabic code-switch speech data that is fully tokenized, lemmatized and annotated for part-of-speech tags. Beside the corpus itself, we provide annotation guidelines to address the unique challenges of annotating code-switch data. Another challenge that we address is the fact that Egyptian Arabic orthography and grammar are not standardized.

pdf
ArzEn: A Speech Corpus for Code-switched Egyptian Arabic-English
Injy Hamed | Ngoc Thang Vu | Slim Abdennadher
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we present our ArzEn corpus, an Egyptian Arabic-English code-switching (CS) spontaneous speech corpus. The corpus is collected through informal interviews with 38 Egyptian bilingual university students and employees held in a soundproof room. A total of 12 hours are recorded, transcribed, validated and sentence segmented. The corpus is mainly designed to be used in Automatic Speech Recognition (ASR) systems, however, it also provides a useful resource for analyzing the CS phenomenon from linguistic, sociological, and psychological perspectives. In this paper, we first discuss the CS phenomenon in Egypt and the factors that gave rise to the current language. We then provide a detailed description on how the corpus was collected, giving an overview on the participants involved. We also present statistics on the CS involved in the corpus, as well as a summary to the effort exerted in the corpus development, in terms of number of hours required for transcription, validation, segmentation and speaker annotation. Finally, we discuss some factors contributing to the complexity of the corpus, as well as Arabic-English CS behaviour that could pose potential challenges to ASR systems.

2018

pdf
Collection and Analysis of Code-switch Egyptian Arabic-English Speech Corpus
Injy Hamed | Mohamed Elmahdy | Slim Abdennadher
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
A Game with a Purpose for Automatic Detection of Children’s Speech Disabilities using Limited Speech Resources
Reem Salem | Mohamed Elmahdy | Slim Abdennadher | Injy Hamed
Proceedings of the 1st Workshop on Natural Language Processing and Information Retrieval associated with RANLP 2017

Speech therapists and researchers are becoming more concerned with the use of computer-based systems in the therapy of speech disorders. In this paper, we propose a computer-based game with a purpose (GWAP) for speech therapy of Egyptian speaking children suffering from Dyslalia. Our aim is to detect if a certain phoneme is pronounced correctly. An Egyptian Arabic speech corpus has been collected. A baseline acoustic model was trained using the Egyptian corpus. In order to benefit from existing large amounts of Modern Standard Arabic (MSA) resources, MSA acoustic models were adapted with the collected Egyptian corpus. An independent testing set that covers common speech disorders has been collected for Egyptian speakers. Results show that adapted acoustic models give better recognition accuracy which could be relied on in the game and that children show more interest in playing the game than in visiting the therapist. A noticeable progress in children Dyslalia appeared with the proposed system.