Rahma Boujelbane


2020

pdf bib
Parallel resources for Tunisian Arabic Dialect Translation
Saméh Kchaou | Rahma Boujelbane | Lamia Hadrich-Belguith
Proceedings of the Fifth Arabic Natural Language Processing Workshop

The difficulty of processing dialects is clearly observed in the high cost of building representative corpus, in particular for machine translation. Indeed, all machine translation systems require a huge amount and good management of training data, which represents a challenge in a low-resource setting such as the Tunisian Arabic dialect. In this paper, we present a data augmentation technique to create a parallel corpus for Tunisian Arabic dialect written in social media and standard Arabic in order to build a Machine Translation (MT) model. The created corpus was used to build a sentence-based translation model. This model reached a BLEU score of 15.03% on a test set, while it was limited to 13.27% utilizing the corpus without augmentation.

2014

pdf bib
A Conventional Orthography for Tunisian Arabic
Inès Zribi | Rahma Boujelbane | Abir Masmoudi | Mariem Ellouze | Lamia Belguith | Nizar Habash
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Tunisian Arabic is a dialect of the Arabic language spoken in Tunisia. Tunisian Arabic is an under-resourced language. It has neither a standard orthography nor large collections of written text and dictionaries. Actually, there is no strict separation between Modern Standard Arabic, the official language of the government, media and education, and Tunisian Arabic; the two exist on a continuum dominated by mixed forms. In this paper, we present a conventional orthography for Tunisian Arabic, following a previous effort on developing a conventional orthography for Dialectal Arabic (or CODA) demonstrated for Egyptian Arabic. We explain the design principles of CODA and provide a detailed description of its guidelines as applied to Tunisian Arabic.

2013

pdf bib
Building bilingual lexicon to create Dialect Tunisian corpora and adapt language model
Rahma Boujelbane | Mariem Ellouze khemekhem | Siwar BenAyed | Lamia Hadrich Belguith
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf bib
The Effects of Factorizing Root and Pattern Mapping in Bidirectional Tunisian - Standard Arabic Machine Translation
Ahmed Hamdi | Rahma Boujelbane | Nizar Habash | Alexis Nasr
Proceedings of Machine Translation Summit XIV: Papers

pdf bib
Mapping Rules for Building a Tunisian Dialect Lexicon and Generating Corpora
Rahma Boujelbane | Mariem Ellouze Khemekhem | Lamia Hadrich Belguith
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Translating verbs between MSA and arabic dialects through deep morphological analysis (Un système de traduction de verbes entre arabe standard et arabe dialectal par analyse morphologique profonde) [in French]
Ahmed Hamdi | Rahma Boujelbane | Nizar Habash | Alexis Nasr
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf bib
Generation of tunisian dialect corpora for adapting language models (Génération des corpus en dialecte tunisien pour la modélisation de langage d’un système de reconnaissance) [in French]
Rahma Boujelbane
Proceedings of RECITAL 2013