Zero-Shot Context-Aware ASR for Diverse Arabic Varieties

Bashar Talafha; Amin Abu Alhassan; Muhammad Abdul-Mageed

Zero-Shot Context-Aware ASR for Diverse Arabic Varieties

Bashar Talafha, Amin Abu Alhassan, Muhammad Abdul-Mageed

Abstract

Zero-shot ASR for Arabic remains challenging: while multilingual models perform well on Modern Standard Arabic (MSA), error rates rise sharply on dialectal and accented speech due to linguistic mismatch and scarce labeled data. We study context-aware decoding as a lightweight test-time adaptation paradigm that conditions inference on external side information without parameter updates. For promptable encoder–decoder ASR (e.g., Whisper), we incorporate context through (i) decoder prompting with first-pass hypotheses and (ii) encoder/decoder prefixing with retrieved speech-text exemplars, complemented by simple prompt reordering and optional speaker-matched synthetic exemplars to improve robustness in informal and multi-speaker settings. To extend contextual adaptation beyond promptable architectures, we introduce proxy-guided n-best selection for CTC ASR: given one or more external proxy hypotheses, we select from a model’s n-best list by minimizing text-level distance to the proxies, enabling contextual inference without direct prompting. Across ten Arabic conditions spanning MSA, accented MSA, and multiple dialects, the best-performing context-aware variants yield average relative WER reductions of 22.29% on MSA, 20.54% on accented MSA, and 9.15% on dialectal Arabic. For CTC ASR on our Common Voice MSA testbed, proxy-guided selection reduces WER by 15.6% relative and recovers a substantial fraction of oracle n-best gains, showing that external-context guidance can also benefit non-promptable ASR.

Anthology ID:: 2026.findings-acl.1296
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 26029–26044
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1296/
DOI:
Bibkey:
Cite (ACL):: Bashar Talafha, Amin Abu Alhassan, and Muhammad Abdul-Mageed. 2026. Zero-Shot Context-Aware ASR for Diverse Arabic Varieties. In Findings of the Association for Computational Linguistics: ACL 2026, pages 26029–26044, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Zero-Shot Context-Aware ASR for Diverse Arabic Varieties (Talafha et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1296.pdf
Checklist:: 2026.findings-acl.1296.checklist.pdf

PDF Cite Search Checklist Fix data