Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?

Perla Al Almaoui, Pierrette Bouillon, Simon Hengchen


Abstract
In an era of rapid technological advancements, communication continues to evolve as new linguistic phenomena emerge. Among these is Arabizi, a hybrid form of Arabic that incorporates Latin characters and numbers to represent the spoken dialects of Arab communities. Arabizi is Widely used on social media and allows people to communicate in an informal and dynamic way, but it poses significant challenges for machine translation due to its lack of formal structure and deeply embedded cultural nuances. This case study is motivated by a growing need to translate Arabizi for gisting purpose. It evaluates the capacity of different LLMs’ to decode and translate Arabizi, focusing on multiple Arabic dialects that have rarely been studied up until now. Using a combination of human evaluators and automatic metrics, this research project investigates the model’s performance in translating Arabizi into both Modern Standard Arabic and English. Key questions explored include which dialects are translated most effectively and whether translations into English surpass those into Arabic.
Anthology ID:
2025.mtsummit-2.4
Volume:
Proceedings of Machine Translation Summit XX: Volume 2
Month:
June
Year:
2025
Address:
Geneva, Switzerland
Editors:
Pierrette Bouillon, Johanna Gerlach, Sabrina Girletti, Lise Volkart, Raphael Rubino, Rico Sennrich, Samuel Läubli, Martin Volk, Miquel Esplà-Gomis, Vincent Vandeghinste, Helena Moniz, Sara Szoc
Venue:
MTSummit
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
28–41
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-2.4/
DOI:
Bibkey:
Cite (ACL):
Perla Al Almaoui, Pierrette Bouillon, and Simon Hengchen. 2025. Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?. In Proceedings of Machine Translation Summit XX: Volume 2, pages 28–41, Geneva, Switzerland. European Association for Machine Translation.
Cite (Informal):
Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin? (Almaoui et al., MTSummit 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.mtsummit-2.4.pdf