Mohamed Lichouri

2023

pdf abs
USTHB at ArAIEval’23 Shared Task: Disinformation Detection System based on Linguistic Feature Concatenation
Mohamed Lichouri | Khaled Lounnas | Aicha Zitouni | Houda Latrache | Rachida Djeradi
Proceedings of ArabicNLP 2023

In this research paper, we undertake a comprehensive examination of several pivotal factors that impact the performance of Arabic Disinformation Detection in the ArAIEval’2023 shared task. Our exploration encompasses the influence of surface preprocessing, morphological preprocessing, the FastText vector model, and the weighted fusion of TF-IDF features. To carry out classification tasks, we employ the Linear Support Vector Classification (LSVC) model. In the evaluation phase, our system showcases significant results, achieving an F₁ micro score of 76.70% and 50.46% for binary and multiple classification scenarios, respectively. These accomplishments closely correspond to the average F₁ micro scores achieved by other systems submitted for the second subtask, standing at 77.96% and 64.85% for binary and multiple classification scenarios, respectively.

pdf abs
USTHB at NADI 2023 shared task: Exploring Preprocessing and Feature Engineering Strategies for Arabic Dialect Identification
Mohamed Lichouri | Khaled Lounnas | Aicha Zitouni | Houda Latrache | Rachida Djeradi
Proceedings of ArabicNLP 2023

In this paper, we conduct an in-depth analysis of several key factors influencing the performance of Arabic Dialect Identification NADI’2023, with a specific focus on the first subtask involving country-level dialect identification. Our investigation encompasses the effects of surface preprocessing, morphological preprocessing, FastText vector model, and the weighted concatenation of TF-IDF features. For classification purposes, we employ the Linear Support Vector Classification (LSVC) model. During the evaluation phase, our system demonstrates noteworthy results, achieving an F₁ score of 62.51%. This achievement closely aligns with the average F₁ scores attained by other systems submitted for the first subtask, which stands at 72.91%.

2022

pdf
Towards an Automatic Dialect Identification System using Algerian Youtube Videos
Khaled Lounnas | Mohamed Lichouri | Mourad Abbas | Thissas Chahboub | Samir Salmi
Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)

pdf
An empirical Comparison of Arabic Named Entity Recognition Methods: Application to the ALP Corpus
Mohamed Lichouri
Proceedings of the Third International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2022) co-located with ICNLSP 2022

2021

pdf abs
Arabic Dialect Identification based on a Weighted Concatenation of TF-IDF Features
Mohamed Lichouri | Mourad Abbas | Khaled Lounnas | Besma Benaziz | Aicha Zitouni
Proceedings of the Sixth Arabic Natural Language Processing Workshop

In this paper, we analyze the impact of the weighted concatenation of TF-IDF features for the Arabic Dialect Identification task while we participated in the NADI2021 shared task. This study is performed for two subtasks: subtask 1.1 (country-level MSA) and subtask 1.2 (country-level DA) identification. The classifiers supporting our comparative study are Linear Support Vector Classification (LSVC), Linear Regression (LR), Perceptron, Stochastic Gradient Descent (SGD), Passive Aggressive (PA), Complement Naive Bayes (CNB), MutliLayer Perceptron (MLP), and RidgeClassifier. In the evaluation phase, our system gives F1 scores of 14.87% and 21.49%, for country-level MSA and DA identification respectively, which is very close to the average F1 scores achieved by the submitted systems and recorded for both subtasks (18.70% and 24.23%).

pdf abs
Preprocessing Solutions for Detection of Sarcasm and Sentiment for Arabic
Mohamed Lichouri | Mourad Abbas | Besma Benaziz | Aicha Zitouni | Khaled Lounnas
Proceedings of the Sixth Arabic Natural Language Processing Workshop

This paper describes our approach to detecting Sentiment and Sarcasm for Arabic in the ArSarcasm 2021 shared task. Data preprocessing is a crucial task for a successful learning, that is why we applied a set of preprocessing steps to the dataset before training two classifiers, namely Linear Support Vector Classifier (LSVC) and Bidirectional Long Short Term Memory (BiLSTM). The findings show that despite the simplicity of the proposed approach, using the LSVC model with a normalizing Arabic (NA) preprocessing and the BiLSTM architecture with an Embedding layer as input have yielded an encouraging F1score of 33.71% and 57.80% for sarcasm and sentiment detection, respectively.

pdf
Machine Translation for Zero and Low-resourced Dialects using a New Extended Version of the Dialectal Parallel Corpus (Padic v2.0)
Mohamed Lichouri | Mourad Abbas
Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021)

pdf
TPT: An Empirical Term Selection for Arabic Text Categorization
Mourad Abbas | Mohamed Lichouri
Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021)

pdf
Towards Phone Number Recognition For Code Switched Algerian Dialect
Khaled Lounnas | Mourad Abbas | Mohamed Lichouri
Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021)

2020

pdf abs
Simple vs Oversampling-based Classification Methods for Fine Grained Arabic Dialect Identification in Twitter
Mohamed Lichouri | Mourad Abbas
Proceedings of the Fifth Arabic Natural Language Processing Workshop

In this paper, we present a description of our experiments on country-level Arabic dialect identification. A comparison study between a set of classifiers has been carried out. The best results were achieved using the Linear Support Vector Classification (LSVC) model by applying a Random Over Sampling (ROS) process yielding an F1-score of 18.74% in the post-evaluation phase. In the evaluation phase, our best submitted system has achieved an F1-score of 18.27%, very close to the average F1-score (18.80%) obtained for all the submitted systems.

pdf abs
SpeechTrans@SMM4H’20: Impact of Preprocessing and N-grams on Automatic Classification of Tweets That Mention Medications
Mohamed Lichouri | Mourad Abbas
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

This paper describes our system developed for automatically classifying tweets that mention medications. We used the Decision Tree classifier for this task. We have shown that using some elementary preprocessing steps and TF-IDF n-grams led to acceptable classifier performance. Indeed, the F1-score recorded was 74.58% in the development phase and 63.70% in the test phase.

2019

pdf abs
ST MADAR 2019 Shared Task: Arabic Fine-Grained Dialect Identification
Mourad Abbas | Mohamed Lichouri | Abed Alhakim Freihat
Proceedings of the Fourth Arabic Natural Language Processing Workshop

This paper describes the solution that we propose on MADAR 2019 Arabic Fine-Grained Dialect Identification task. The proposed solution utilized a set of classifiers that we trained on character and word features. These classifiers are: Support Vector Machines (SVM), Bernoulli Naive Bayes (BNB), Multinomial Naive Bayes (MNB), Logistic Regression (LR), Stochastic Gradient Descent (SGD), Passive Aggressive(PA) and Perceptron (PC). The system achieved competitive results, with a performance of 62.87 % and 62.12 % for both development and test sets.

pdf
An Arabic Multi-Domain Spoken Language Understanding System
Mohamed Lichouri | Mourad Abbas | Rachida Djeradi | Amar Djeradi
Proceedings of the 3rd International Conference on Natural Language and Speech Processing

pdf
Building a Speech Corpus based on Arabic Podcasts for Language and Dialect Identification
Khaled Lounnas | Mourad Abbas | Mohamed Lichouri
Proceedings of the 3rd International Conference on Natural Language and Speech Processing

pdf
ST NSURL 2019 Shared Task: Semantic Question Similarity in Arabic
Mohamed Lichouri | Mourad Abbas | Besma Benaziz | Abed Alhakim Freihat
Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers