Rahma Boujelbane

2024

pdf abs
ANLP RG at StanceEval2024: Comparative Evaluation of Stance, Sentiment and Sarcasm Detection
Mezghani Amal | Rahma Boujelbane | Mariem Ellouze
Proceedings of The Second Arabic Natural Language Processing Conference

As part of our study, we worked on three tasks:stance detection, sarcasm detection and senti-ment analysis using fine-tuning techniques onBERT-based models. Fine-tuning parameterswere carefully adjusted over multiple iterationsto maximize model performance. The threetasks are essential in the field of natural lan-guage processing (NLP) and present uniquechallenges. Stance detection is a critical taskaimed at identifying a writer’s stances or view-points in relation to a topic. Sarcasm detectionseeks to spot sarcastic expressions, while senti-ment analysis determines the attitude expressedin a text. After numerous experiments, we iden-tified Arabert-twitter as the model offering thebest performance for all three tasks. In particu-lar, it achieves a macro F-score of 78.08% forstance detection, a macro F1-score of 59.51%for sarcasm detection and a macro F1-score of64.57% for sentiment detection. .Our source code is available at https://github.com/MezghaniAmal/Mawqif

2023

pdf abs
ANLP-RG at NADI 2023 shared task: Machine Translation of Arabic Dialects: A Comparative Study of Transformer Models
Wiem Derouich | Sameh Kchaou | Rahma Boujelbane
Proceedings of ArabicNLP 2023

In this paper, we present our findings within the context of the NADI-2023 Shared Task (Subtask 2). Our task involves developing a translation model from the Palestinian, Jordanian, Emirati, and Egyptian dialects to Modern Standard Arabic (MSA) using the MADAR parallel corpus, even though it lacks a parallel subset for the Emirati dialect. To address this challenge, we conducted a comparative analysis, evaluating the fine-tuning results of various transformer models using the MADAR corpus as a learning resource. Additionally, we assessed the effectiveness of existing translation tools in achieving our translation objectives. The best model achieved a BLEU score of 11.14% on the dev set and 10.02 on the test set.

2022

pdf abs
Benchmarking transfer learning approaches for sentiment analysis of Arabic dialect
Emna Fsih | Sameh Kchaou | Rahma Boujelbane | Lamia Hadrich-Belguith
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

Arabic has a widely varying collection of dialects. With the explosion of the use of social networks, the volume of written texts has remarkably increased. Most users express themselves using their own dialect. Unfortunately, many of these dialects remain under-studied due to the scarcity of resources. Researchers and industry practitioners are increasingly interested in analyzing users’ sentiments. In this context, several approaches have been proposed, namely: traditional machine learning, deep learning transfer learning and more recently few-shot learning approaches. In this work, we compare their efficiency as part of the NADI competition to develop a country-level sentiment analysis model. Three models were beneficial for this sub-task: The first based on Sentence Transformer (ST) and achieve 43.23% on DEV set and 42.33% on TEST set, the second based on CAMeLBERT and achieve 47.85% on DEV set and 41.72% on TEST set and the third based on multi-dialect BERT model and achieve 66.72% on DEV set and 39.69% on TEST set.

pdf abs
Standardisation of Dialect Comments in Social Networks in View of Sentiment Analysis : Case of Tunisian Dialect
Saméh Kchaou | Rahma Boujelbane | Emna Fsih | Lamia Hadrich-Belguith
Proceedings of the Thirteenth Language Resources and Evaluation Conference

With the growing access to the internet, the spoken Arabic dialect language becomes informal languages written in social media. Most users post comments using their own dialect. This linguistic situation inhibits mutual understanding between internet users and makes difficult to use computational approaches since most Arabic resources are intended for the formal language: Modern Standard Arabic (MSA). In this paper, we present a pipeline to standardize the written texts in social networks by translating them to the standard language MSA. We fine-tun at first an identification bert-based model to select Tunisian Dialect (TD) from MSA and other dialects. Then, we learned transformer model to translate TD to MSA. The final system includes the translated TD text and the originally text written in MSA. Each of these steps was evaluated on the same test corpus. In order to test the effectiveness of the approach, we compared two opinion analysis models, the first intended for the Sentiment Analysis (SA) of dialect texts and the second for the MSA texts. We concluded that through standardization we obtain the best score.

pdf
A deep sentiment analysis of Tunisian dialect comments on multi-domain posts in different social media platforms
Emna Fsih | Rahma Boujelbane | Lamia Hadrich Belguith
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

2020

pdf abs
Parallel resources for Tunisian Arabic Dialect Translation
Saméh Kchaou | Rahma Boujelbane | Lamia Hadrich-Belguith
Proceedings of the Fifth Arabic Natural Language Processing Workshop

The difficulty of processing dialects is clearly observed in the high cost of building representative corpus, in particular for machine translation. Indeed, all machine translation systems require a huge amount and good management of training data, which represents a challenge in a low-resource setting such as the Tunisian Arabic dialect. In this paper, we present a data augmentation technique to create a parallel corpus for Tunisian Arabic dialect written in social media and standard Arabic in order to build a Machine Translation (MT) model. The created corpus was used to build a sentence-based translation model. This model reached a BLEU score of 15.03% on a test set, while it was limited to 13.27% utilizing the corpus without augmentation.

2014

Tunisian Arabic is a dialect of the Arabic language spoken in Tunisia. Tunisian Arabic is an under-resourced language. It has neither a standard orthography nor large collections of written text and dictionaries. Actually, there is no strict separation between Modern Standard Arabic, the official language of the government, media and education, and Tunisian Arabic; the two exist on a continuum dominated by mixed forms. In this paper, we present a conventional orthography for Tunisian Arabic, following a previous effort on developing a conventional orthography for Dialectal Arabic (or CODA) demonstrated for Egyptian Arabic. We explain the design principles of CODA and provide a detailed description of its guidelines as applied to Tunisian Arabic.

pdf
De l’arabe standard vers l’arabe dialectal : projection de corpus et ressources linguistiques en vue du traitement automatique de l’oral dans les médias tunisiens [From Modern Standard Arabic to Tunisian dialect: corpus projection and linguistic resources towards the automatic processing of speech in the Tunisian media]
Rahma Boujelbane | Mariem Ellouze | Frédéric Béchet | Lamia Belguith
Traitement Automatique des Langues, Volume 55, Numéro 2 : Traitement automatique du langage parlé [Spoken language processing]