Rahma Boujelbane


2022

pdf
Standardisation of Dialect Comments in Social Networks in View of Sentiment Analysis : Case of Tunisian Dialect
Saméh Kchaou | Rahma Boujelbane | Emna Fsih | Lamia Hadrich-Belguith
Proceedings of the Thirteenth Language Resources and Evaluation Conference

With the growing access to the internet, the spoken Arabic dialect language becomes informal languages written in social media. Most users post comments using their own dialect. This linguistic situation inhibits mutual understanding between internet users and makes difficult to use computational approaches since most Arabic resources are intended for the formal language: Modern Standard Arabic (MSA). In this paper, we present a pipeline to standardize the written texts in social networks by translating them to the standard language MSA. We fine-tun at first an identification bert-based model to select Tunisian Dialect (TD) from MSA and other dialects. Then, we learned transformer model to translate TD to MSA. The final system includes the translated TD text and the originally text written in MSA. Each of these steps was evaluated on the same test corpus. In order to test the effectiveness of the approach, we compared two opinion analysis models, the first intended for the Sentiment Analysis (SA) of dialect texts and the second for the MSA texts. We concluded that through standardization we obtain the best score.

2020

pdf
Parallel resources for Tunisian Arabic Dialect Translation
Saméh Kchaou | Rahma Boujelbane | Lamia Hadrich-Belguith
Proceedings of the Fifth Arabic Natural Language Processing Workshop

The difficulty of processing dialects is clearly observed in the high cost of building representative corpus, in particular for machine translation. Indeed, all machine translation systems require a huge amount and good management of training data, which represents a challenge in a low-resource setting such as the Tunisian Arabic dialect. In this paper, we present a data augmentation technique to create a parallel corpus for Tunisian Arabic dialect written in social media and standard Arabic in order to build a Machine Translation (MT) model. The created corpus was used to build a sentence-based translation model. This model reached a BLEU score of 15.03% on a test set, while it was limited to 13.27% utilizing the corpus without augmentation.

2014

pdf
A Conventional Orthography for Tunisian Arabic
Inès Zribi | Rahma Boujelbane | Abir Masmoudi | Mariem Ellouze | Lamia Belguith | Nizar Habash
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Tunisian Arabic is a dialect of the Arabic language spoken in Tunisia. Tunisian Arabic is an under-resourced language. It has neither a standard orthography nor large collections of written text and dictionaries. Actually, there is no strict separation between Modern Standard Arabic, the official language of the government, media and education, and Tunisian Arabic; the two exist on a continuum dominated by mixed forms. In this paper, we present a conventional orthography for Tunisian Arabic, following a previous effort on developing a conventional orthography for Dialectal Arabic (or CODA) demonstrated for Egyptian Arabic. We explain the design principles of CODA and provide a detailed description of its guidelines as applied to Tunisian Arabic.

2013

pdf
Mapping Rules for Building a Tunisian Dialect Lexicon and Generating Corpora
Rahma Boujelbane | Mariem Ellouze Khemekhem | Lamia Hadrich Belguith
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf
Building bilingual lexicon to create Dialect Tunisian corpora and adapt language model
Rahma Boujelbane | Mariem Ellouze khemekhem | Siwar BenAyed | Lamia Hadrich Belguith
Proceedings of the Second Workshop on Hybrid Approaches to Translation

pdf
Translating verbs between MSA and arabic dialects through deep morphological analysis (Un système de traduction de verbes entre arabe standard et arabe dialectal par analyse morphologique profonde) [in French]
Ahmed Hamdi | Rahma Boujelbane | Nizar Habash | Alexis Nasr
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf
Generation of tunisian dialect corpora for adapting language models (Génération des corpus en dialecte tunisien pour la modélisation de langage d’un système de reconnaissance) [in French]
Rahma Boujelbane
Proceedings of RECITAL 2013

pdf
The Effects of Factorizing Root and Pattern Mapping in Bidirectional Tunisian - Standard Arabic Machine Translation
Ahmed Hamdi | Rahma Boujelbane | Nizar Habash | Alexis Nasr
Proceedings of Machine Translation Summit XIV: Papers