Abed Qaddoumi
2026
Learning Stress in Arabic Low-Resource Settings
Abed Qaddoumi | Jordan Kodner | Owen Rambow | Salam Khalifa | Jeffrey Heinz
Proceedings of the Society for Computation in Linguistics 2026
Abed Qaddoumi | Jordan Kodner | Owen Rambow | Salam Khalifa | Jeffrey Heinz
Proceedings of the Society for Computation in Linguistics 2026
We predict lexical stress in Arabic varieties using syllable structure (a sequence of CVs, with C for consonants and V for vowels). Our task is generation: given an unstressed input, the system outputs a stress-marked word. We compare four approaches: a grammar induction algorithm (BUFIA), a transformer-based neural network (NN), a rule-based method, and a frequency baseline. The models are evaluated across several low-resource settings by varying the training data size by words, structural type, and syllable count. BUFIA outperforms the neural network, especially when data are scarce. This supports grammar induction as an interpretable and sample-efficient alternative for learning stress.
Computational Benchmarks for Egyptian Arabic Child Directed Speech
Salam Khalifa | Abed Qaddoumi | Nizar Habash | Owen Rambow
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Salam Khalifa | Abed Qaddoumi | Nizar Habash | Owen Rambow
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
We present AraBabyTalk-EGY, an enriched release of the Egyptian Arabic CHILDES corpus, that opens the child-adult interactions genre to modern Arabic NLP research. Starting from the original CHILDES recordings and IPA transcriptions of caregiver-child sessions, we (i) map each IPA token to fully diacritized Arabic script, and (ii) add core part-of-speech tags and lemmas aligned with existing dialectal Arabic morphological resources. These layers yield ~26K annotated tokens suitable for both text- and speech-based NLP tasks. We provide a benchmark on morphological disambiguation and Arabic ASR. We outline lexical and morphosyntactic differences between AraBabyTalk-EGY and general Egyptian Arabic resources, highlighting the value of genre-specific training data for language acquisition studies and Arabic speech technology.