ALADAN at IWSLT25 Low-resource Arabic Dialectal Speech Translation Task
Josef Jon, Waad Ben Kheder, Andre Beyer, Claude Barras, Jean-Luc Gauvain
Abstract
We present our IWSLT 2025 submission for the low-resource track on North Levantine Arabic to English speech translation, building on our IWSLT 2024 efforts. We retain last year’s cascade ASR architecture that combines a TDNN-F model and a Zipformer for the ASR step. We upgrade the Zipformer to the Zipformer-Large variant (253 M parameters vs. 66 M) to capture richer acoustic representations. For the MT part, to further alleviate data sparsity, we created a crowd-sourced parallel corpus covering five major Arabic dialects (Tunisian, Levantine, Moroccan, Algerian, Egyptian) curated via rigorous qualification and filtering. We show that using crowd-sourced data is feasible in low-resource scenarios as we observe improved automatic evaluation metrics across all dialects. We also experimented with the dataset under a high-resource scenario, where we had access to a large, high-quality Levantine Arabic corpus from LDC. In this setting, adding the crowd-sourced data does not improve the scores on the official validation set anymore. Our final submission scores 20.0 BLEU on the official test set.- Anthology ID:
- 2025.iwslt-1.24
- Volume:
- Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria (in-person and online)
- Editors:
- Elizabeth Salesky, Marcello Federico, Antonis Anastasopoulos
- Venues:
- IWSLT | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 252–259
- Language:
- URL:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.iwslt-1.24/
- DOI:
- Cite (ACL):
- Josef Jon, Waad Ben Kheder, Andre Beyer, Claude Barras, and Jean-Luc Gauvain. 2025. ALADAN at IWSLT25 Low-resource Arabic Dialectal Speech Translation Task. In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), pages 252–259, Vienna, Austria (in-person and online). Association for Computational Linguistics.
- Cite (Informal):
- ALADAN at IWSLT25 Low-resource Arabic Dialectal Speech Translation Task (Jon et al., IWSLT 2025)
- PDF:
- https://preview.aclanthology.org/acl25-workshop-ingestion/2025.iwslt-1.24.pdf