Anass Allak


2025

pdf bib
Preserving Comorian Linguistic Heritage: Bidirectional Transliteration Between the Latin Alphabet and the Kamar-Eddine System
Abdou Mohamed Naira | Abdessalam Bahafid | Zakarya Erraji | Anass Allak | Mohamed Soibira Naoufal | Imade Benelallam
Proceedings of the 9th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2025)

The Comoros Islands, rich in linguistic diversity, are home to dialects derived from Swahili and influenced by Arabic. Historically, the Kamar-Eddine system, based on the Arabic alphabet, was one of the first writing systems used for Comorian. However, it has gradually been replaced by the Latin alphabet, even though numerous archival texts are written in this system, and older speakers continue to use it, highlighting its cultural and historical significance. In this article, we present Shialifube, a bidirectional transliteration tool between Latin and Arabic scripts, designed in accordance with the rules of the Kamar-Eddine system. To evaluate its performance, we applied a round-trip transliteration technique, achieving a word error rate of 14.84% and a character error rate of 9.56%. These results demonstrate the reliability of our system for complex tasks. Furthermore, Shialifube was tested in a practical case related to speech recognition, showcasing its potential in Natural Language Processing. This project serves as a bridge between tradition and modernity, contributing to the preservation of Comorian linguistic heritage while paving the way for better integration of local dialects into advanced technologies.