Abstract
This paper outlines the process of training the AraT5-MSAizer model, a transformer-based neural machine translation model aimed at translating five regional Arabic dialects into Modern Standard Arabic (MSA). Developed for Task 2 of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools, the model attained a BLEU score of 21.79% on the test set associated with this task.- Anthology ID:
- 2024.osact-1.16
- Volume:
- Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Hend Al-Khalifa, Kareem Darwish, Hamdy Mubarak, Mona Ali, Tamer Elsayed
- Venues:
- OSACT | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 124–129
- Language:
- URL:
- https://aclanthology.org/2024.osact-1.16
- DOI:
- Cite (ACL):
- Murhaf Fares. 2024. AraT5-MSAizer: Translating Dialectal Arabic to MSA. In Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024, pages 124–129, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- AraT5-MSAizer: Translating Dialectal Arabic to MSA (Fares, OSACT-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.osact-1.16.pdf