2022
pdf
abs
iCompass Working Notes for the Nuanced Arabic Dialect Identification Shared task
Abir Messaoudi
|
Chayma Fourati
|
Hatem Haddad
|
Moez BenHajhmida
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
We describe our submitted system to the Nuanced Arabic Dialect Identification (NADI) shared task. We tackled only the first subtask (Subtask 1). We used state-of-the-art Deep Learning models and pre-trained contextualized text representation models that we finetuned according to the downstream task in hand. As a first approach, we used BERT Arabic variants: MARBERT with its two versions MARBERT v1 and MARBERT v2, we combined MARBERT embeddings with a CNN classifier, and finally, we tested the Quasi-Recurrent Neural Networks (QRNN) model. The results found show that version 2 of MARBERT outperforms all of the previously mentioned models on Subtask 1.
pdf
TuniSER: Toward a Tunisian Speech Emotion Recognition System
Abir Messaoudi
|
Hatem Haddad
|
Moez Benhaj Hmida
|
Mohamed Graiet
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)
2021
pdf
abs
Introducing A large Tunisian Arabizi Dialectal Dataset for Sentiment Analysis
Chayma Fourati
|
Hatem Haddad
|
Abir Messaoudi
|
Moez BenHajhmida
|
Aymen Ben Elhaj Mabrouk
|
Malek Naski
Proceedings of the Sixth Arabic Natural Language Processing Workshop
On various Social Media platforms, people, tend to use the informal way to communicate, or write posts and comments: their local dialects. In Africa, more than 1500 dialects and languages exist. Particularly, Tunisians talk and write informally using Latin letters and numbers rather than Arabic ones. In this paper, we introduce a large common-crawl-based Tunisian Arabizi dialectal dataset dedicated for Sentiment Analysis. The dataset consists of a total of 100k comments (about movies, politic, sport, etc.) annotated manually by Tunisian native speakers as Positive, negative and Neutral. We evaluate our dataset on sentiment analysis task using the Bidirectional Encoder Representations from Transformers (BERT) as a contextual language model in its multilingual version (mBERT) as an embedding technique then combining mBERT with Convolutional Neural Network (CNN) as classifier. The dataset is publicly available.
pdf
abs
iCompass at Shared Task on Sarcasm and Sentiment Detection in Arabic
Malek Naski
|
Abir Messaoudi
|
Hatem Haddad
|
Moez BenHajhmida
|
Chayma Fourati
|
Aymen Ben Elhaj Mabrouk
Proceedings of the Sixth Arabic Natural Language Processing Workshop
We describe our submitted system to the 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic (Abu Farha et al., 2021). We tackled both subtasks, namely Sarcasm Detection (Subtask 1) and Sentiment Analysis (Subtask 2). We used state-of-the-art pretrained contextualized text representation models and fine-tuned them according to the downstream task in hand. As a first approach, we used Google’s multilingual BERT and then other Arabic variants: AraBERT, ARBERT and MARBERT. The results found show that MARBERT outperforms all of the previously mentioned models overall, either on Subtask 1 or Subtask 2.
2020
pdf
abs
iCompass at SemEval-2020 Task 12: From a Syntax-ignorant N-gram Embeddings Model to a Deep Bidirectional Language Model
Abir Messaoudi
|
Hatem Haddad
|
Moez Ben Haj Hmida
Proceedings of the Fourteenth Workshop on Semantic Evaluation
We describe our submitted system to the SemEval 2020. We tackled Task 12 entitled “Multilingual Offensive Language Identification in Social Media”, specifically subtask 4A-Arabic. We propose three Arabic offensive language identification models: Tw-StAR, BERT and BERT+BiLSTM. Two Arabic abusive/hate datasets were added to the training dataset: L-HSAB and T-HSAB. The final submission was chosen based on the best performances which was achieved by the BERT+BiLSTM model.