A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic

Juan Moreno Gonzalez, Bashar Alhafni, Nizar Habash


Abstract
Judeo-Arabic refers to Arabic variants historically spoken by Jewish communities across the Arab world, primarily during the Middle Ages. Unlike standard Arabic, it is written in Hebrew script by Jewish writers and for Jewish audiences. Transliterating Judeo-Arabic into Arabic script is challenging due to ambiguous letter mappings, inconsistent orthographic conventions, and frequent code-switching into Hebrew. In this paper, we introduce a two-step approach to automatically transliterate Judeo-Arabic into Arabic script: simple character-level mapping followed by post-correction to address grammatical and orthographic errors. We also present the first benchmark evaluation of LLMs on this task. Finally, we show that transliteration enables Arabic NLP tools to perform morphosyntactic tagging and machine translation, which would have not been feasible on the original texts. We make our code and data publicly available.
Anthology ID:
2026.eacl-long.93
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2100–2113
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.93/
DOI:
Bibkey:
Cite (ACL):
Juan Moreno Gonzalez, Bashar Alhafni, and Nizar Habash. 2026. A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2100–2113, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
A Tale of Two Scripts: Transliteration and Post-Correction for Judeo-Arabic (Gonzalez et al., EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.93.pdf