Zero-Shot Context-Aware ASR for Diverse Arabic Varieties

Bashar Talafha, Amin Abu Alhassan, Muhammad Abdul-Mageed


Abstract
Zero-shot ASR for Arabic remains challenging: while multilingual models perform well on Modern Standard Arabic (MSA), error rates rise sharply on dialectal and accented speech due to linguistic mismatch and scarce labeled data. We study context-aware decoding as a lightweight test-time adaptation paradigm that conditions inference on external side information without parameter updates. For promptable encoder–decoder ASR (e.g., Whisper), we incorporate context through (i) decoder prompting with first-pass hypotheses and (ii) encoder/decoder prefixing with retrieved speech-text exemplars, complemented by simple prompt reordering and optional speaker-matched synthetic exemplars to improve robustness in informal and multi-speaker settings. To extend contextual adaptation beyond promptable architectures, we introduce proxy-guided n-best selection for CTC ASR: given one or more external proxy hypotheses, we select from a model’s n-best list by minimizing text-level distance to the proxies, enabling contextual inference without direct prompting. Across ten Arabic conditions spanning MSA, accented MSA, and multiple dialects, the best-performing context-aware variants yield average relative WER reductions of 22.29% on MSA, 20.54% on accented MSA, and 9.15% on dialectal Arabic. For CTC ASR on our Common Voice MSA testbed, proxy-guided selection reduces WER by 15.6% relative and recovers a substantial fraction of oracle n-best gains, showing that external-context guidance can also benefit non-promptable ASR.
Anthology ID:
2026.findings-acl.1296
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
26029–26044
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1296/
DOI:
Bibkey:
Cite (ACL):
Bashar Talafha, Amin Abu Alhassan, and Muhammad Abdul-Mageed. 2026. Zero-Shot Context-Aware ASR for Diverse Arabic Varieties. In Findings of the Association for Computational Linguistics: ACL 2026, pages 26029–26044, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Zero-Shot Context-Aware ASR for Diverse Arabic Varieties (Talafha et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1296.pdf
Checklist:
 2026.findings-acl.1296.checklist.pdf