Elsayed Issa


2026

Large audio language models (LALMs) integrate audio representations with large language models to enable unified understanding of spoken content. Their capabilities have been increasingly investigated across several benchmarks; however, the examination of their use in rating L2 speech is still in its infancy. This study explores the abilities of LALMs in scoring three L2 speech global dimensions: foreign accentedness, comprehensibility, and intelligibility. Ninety audio samples produced by L2 speakers were rated by ten native speaker raters as well as five LALM models. Model performance was evaluated against the human composite mean using Pearson r, Spearman p, mean absolute error (MAE), and systematic bias, with the human leave-one-out correlation (r = .46-.73 across dimensions) serving as an empirical performance benchmark. The results showed that no LALM reached human-level performance on any dimension. Only one model (i.e., Gemini) achieved a significant correlation with human ratings on comprehensibility (r = .28, p < .01), while Qwen2-Audio showed modest correlation on intelligibility (r = .32, p < .01). MAE ranged from 0.75 to 3.99 for accentedness (human: 1.24), 1.35 to 3.00 for comprehensibility (human: 1.24), and 12.03 to 15.43 for intelligibility (human: 8.49). All models exhibited systematic biases, with deviations ranging from -9.31 to +13.19 points. The paper concludes with a discussion of the implications for automated L2 speech assessment.

2025

2021

This work investigates the value of augmenting recurrent neural networks with feature engineering for the Second Nuanced Arabic Dialect Identification (NADI) Subtask 1.2: Country-level DA identification. We compare the performance of a simple word-level LSTM using pretrained embeddings with one enhanced using feature embeddings for engineered linguistic features. Our results show that the addition of explicit features to the LSTM is detrimental to performance. We attribute this performance loss to the bivalency of some linguistic items in some text, ubiquity of topics, and participant mobility.

2018

Arabic Broken Plurals show an interesting phenomenon in Arabic morphology as they are formed by shifting the consonants of the syllables into different syllable patterns, and subsequently, the pattern of the word changes. The present paper, therefore, attempts to look at Arabic broken plurals from the perspective of neural networks by implementing an OpenNMT experiment to better understand and interpret the behavior of these plurals, especially when it comes to L2 acquisition. The results show that the model is successful in predicting the Arabic template. However, it fails to predict certain consonants such as the emphatics and the gutturals. This reinforces the fact that these consonants or sounds are the most difficult for L2 learners to acquire.