Abstract
The challenges of automated transliteration and code-switching–detection in Judeo-Arabic texts are addressed. We introduce two novel machine-learning models, one focused on transliterating Judeo-Arabic into Arabic, and another aimed at identifying non-Arabic words, predominantly Hebrew and Aramaic. Unlike prior work, our models are based on a bilingual Arabic-Hebrew language model, providing a unique advantage in capturing shared linguistic nuances. Evaluation results show that our models outperform prior solutions for the same tasks. As a practical contribution, we present a comprehensive pipeline capable of taking Judeo-Arabic text, identifying non-Arabic words, and then transliterating the Arabic portions into Arabic script. This work not only advances the state of the art but also offers a valuable toolset for making Judeo-Arabic texts more accessible to a broader Arabic-speaking audience.- Anthology ID:
- 2024.findings-eacl.102
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2024
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian’s, Malta
- Editors:
- Yvette Graham, Matthew Purver
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1501–1511
- Language:
- URL:
- https://aclanthology.org/2024.findings-eacl.102
- DOI:
- Cite (ACL):
- Daniel Weisberg Mitelman, Nachum Dershowitz, and Kfir Bar. 2024. Code-Switching and Back-Transliteration Using a Bilingual Model. In Findings of the Association for Computational Linguistics: EACL 2024, pages 1501–1511, St. Julian’s, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Code-Switching and Back-Transliteration Using a Bilingual Model (Weisberg Mitelman et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2024.findings-eacl.102.pdf