A Multi-Task Learning Approach to Dialectal Arabic Identification and Translation to Modern Standard Arabic

Abdullah Khered, Youcef Benkhedda, Riza Batista-Navarro


Abstract
Translating Dialectal Arabic (DA) into Modern Standard Arabic (MSA) is a complex task due to the linguistic diversity and informal nature of dialects, particularly in social media texts. To improve translation quality, we propose a Multi-Task Learning (MTL) framework that combines DA-MSA translation as the primary task and dialect identification as an auxiliary task. Additionally, we introduce LahjaTube, a new corpus containing DA transcripts and corresponding MSA and English translations, covering four major Arabic dialects: Egyptian (EGY), Gulf (GLF), Levantine (LEV), and Maghrebi (MGR), collected from YouTube. We evaluate AraT5 and AraBART on the Dial2MSA-Verified dataset under Single-Task Learning (STL) and MTL setups. Our results show that adopting the MTL framework and incorporating LahjaTube into the training data improve the translation performance, leading to a BLEU score improvement of 2.65 points over baseline models.
Anthology ID:
2025.lowresnlp-1.4
Volume:
Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages
Month:
September
Year:
2025
Address:
Varna, Bulgaria
Editors:
Ernesto Luis Estevanell-Valladares, Alicia Picazo-Izquierdo, Tharindu Ranasinghe, Besik Mikaberidze, Simon Ostermann, Daniil Gurgurov, Philipp Mueller, Claudia Borg, Marián Šimko
Venues:
LowResNLP | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
21–31
Language:
URL:
https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.4/
DOI:
Bibkey:
Cite (ACL):
Abdullah Khered, Youcef Benkhedda, and Riza Batista-Navarro. 2025. A Multi-Task Learning Approach to Dialectal Arabic Identification and Translation to Modern Standard Arabic. In Proceedings of the First Workshop on Advancing NLP for Low-Resource Languages, pages 21–31, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
A Multi-Task Learning Approach to Dialectal Arabic Identification and Translation to Modern Standard Arabic (Khered et al., LowResNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/corrections-2026-01/2025.lowresnlp-1.4.pdf