Chayma Fourati

2022

pdf abs
iCompass Working Notes for the Nuanced Arabic Dialect Identification Shared task
Abir Messaoudi | Chayma Fourati | Hatem Haddad | Moez BenHajhmida
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

We describe our submitted system to the Nuanced Arabic Dialect Identification (NADI) shared task. We tackled only the first subtask (Subtask 1). We used state-of-the-art Deep Learning models and pre-trained contextualized text representation models that we finetuned according to the downstream task in hand. As a first approach, we used BERT Arabic variants: MARBERT with its two versions MARBERT v1 and MARBERT v2, we combined MARBERT embeddings with a CNN classifier, and finally, we tested the Quasi-Recurrent Neural Networks (QRNN) model. The results found show that version 2 of MARBERT outperforms all of the previously mentioned models on Subtask 1.

pdf abs
iCompass at Arabic Hate Speech 2022: Detect Hate Speech Using QRNN and Transformers
Mohamed Aziz Bennessir | Malek Rhouma | Hatem Haddad | Chayma Fourati
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

This paper provides a detailed overview of the system we submitted as part of the OSACT2022 Shared Tasks on Fine-Grained Hate Speech Detection on Arabic Twitter, its outcome, and limitations. Our submission is accomplished with a hard parameter sharing Multi-Task Model that consisted of a shared layer containing state-of-the-art contextualized text representation models such as MarBERT, AraBERT, ArBERT and task specific layers that were fine-tuned with Quasi-recurrent neural networks (QRNN) for each down-stream subtask. The results show that MARBERT fine-tuned with QRNN outperforms all of the previously mentioned models.

2021

pdf abs
iCompass at NLP4IF-2021–Fighting the COVID-19 Infodemic
Wassim Henia | Oumayma Rjab | Hatem Haddad | Chayma Fourati
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

This paper provides a detailed overview of the system and its outcomes, which were produced as part of the NLP4IF Shared Task on Fighting the COVID-19 Infodemic at NAACL 2021. This task is accomplished using a variety of techniques. We used state-of-the-art contextualized text representation models that were fine-tuned for the downstream task in hand. ARBERT, MARBERT,AraBERT, Arabic ALBERT and BERT-base-arabic were used. According to the results, BERT-base-arabic had the highest 0.784 F1 score on the test set.

pdf abs
Introducing A large Tunisian Arabizi Dialectal Dataset for Sentiment Analysis
Chayma Fourati | Hatem Haddad | Abir Messaoudi | Moez BenHajhmida | Aymen Ben Elhaj Mabrouk | Malek Naski
Proceedings of the Sixth Arabic Natural Language Processing Workshop

On various Social Media platforms, people, tend to use the informal way to communicate, or write posts and comments: their local dialects. In Africa, more than 1500 dialects and languages exist. Particularly, Tunisians talk and write informally using Latin letters and numbers rather than Arabic ones. In this paper, we introduce a large common-crawl-based Tunisian Arabizi dialectal dataset dedicated for Sentiment Analysis. The dataset consists of a total of 100k comments (about movies, politic, sport, etc.) annotated manually by Tunisian native speakers as Positive, negative and Neutral. We evaluate our dataset on sentiment analysis task using the Bidirectional Encoder Representations from Transformers (BERT) as a contextual language model in its multilingual version (mBERT) as an embedding technique then combining mBERT with Convolutional Neural Network (CNN) as classifier. The dataset is publicly available.

We describe our submitted system to the 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic (Abu Farha et al., 2021). We tackled both subtasks, namely Sarcasm Detection (Subtask 1) and Sentiment Analysis (Subtask 2). We used state-of-the-art pretrained contextualized text representation models and fine-tuned them according to the downstream task in hand. As a first approach, we used Google’s multilingual BERT and then other Arabic variants: AraBERT, ARBERT and MARBERT. The results found show that MARBERT outperforms all of the previously mentioned models overall, either on Subtask 1 or Subtask 2.

Co-authors

Mohamed Aziz Bennessir 1

Malek Rhouma 1

Wassim Henia 1

Oumayma Rjab 1

Chayma Fourati

2022

2021

Co-authors

Venues