Assessment of L2 speech global dimensions using large audio language models

Elsayed Issa, Mahmoud Ali


Abstract
Large audio language models (LALMs) integrate audio representations with large language models to enable unified understanding of spoken content. Their capabilities have been increasingly investigated across several benchmarks; however, the examination of their use in rating L2 speech is still in its infancy. This study explores the abilities of LALMs in scoring three L2 speech global dimensions: foreign accentedness, comprehensibility, and intelligibility. Ninety audio samples produced by L2 speakers were rated by ten native speaker raters as well as five LALM models. Model performance was evaluated against the human composite mean using Pearson r, Spearman p, mean absolute error (MAE), and systematic bias, with the human leave-one-out correlation (r = .46-.73 across dimensions) serving as an empirical performance benchmark. The results showed that no LALM reached human-level performance on any dimension. Only one model (i.e., Gemini) achieved a significant correlation with human ratings on comprehensibility (r = .28, p < .01), while Qwen2-Audio showed modest correlation on intelligibility (r = .32, p < .01). MAE ranged from 0.75 to 3.99 for accentedness (human: 1.24), 1.35 to 3.00 for comprehensibility (human: 1.24), and 12.03 to 15.43 for intelligibility (human: 8.49). All models exhibited systematic biases, with deviations ranging from -9.31 to +13.19 points. The paper concludes with a discussion of the implications for automated L2 speech assessment.
Anthology ID:
2026.bea-1.49
Volume:
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Bashar Alhafni, Stefano Bannò, Marie Bexte, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Anais Tack, Victoria Yaneva, Zheng Yuan
Venues:
BEA | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
702–714
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.49/
DOI:
Bibkey:
Cite (ACL):
Elsayed Issa and Mahmoud Ali. 2026. Assessment of L2 speech global dimensions using large audio language models. In Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026), pages 702–714, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Assessment of L2 speech global dimensions using large audio language models (Issa & Ali, BEA 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.bea-1.49.pdf