2022
pdf
abs
Ara-Women-Hate: An Annotated Corpus Dedicated to Hate Speech Detection against Women in the Arabic Community
Imane Guellil
|
Ahsan Adeel
|
Faical Azouaou
|
Mohamed Boubred
|
Yousra Houichi
|
Akram Abdelhaq Moumna
Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference
In this paper, an approach for hate speech detection against women in the Arabic community on social media (e.g. Youtube) is proposed. In the literature, similar works have been presented for other languages such as English. However, to the best of our knowledge, not much work has been conducted in the Arabic language. A new hate speech corpus (Arabic_fr_en) is developed using three different annotators. For corpus validation, three different machine learning algorithms are used, including deep Convolutional Neural Network (CNN), long short-term memory (LSTM) network and Bi-directional LSTM (Bi-LSTM) network. Simulation results demonstrate the best performa
2021
pdf
abs
ONE: Toward ONE model, ONE algorithm, ONE corpus dedicated to sentiment analysis of Arabic/Arabizi and its dialects
Imane Guellil
|
Faical Azouaou
|
Fodil Benali
|
Hachani Ala-Eddine
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Arabic is the official language of 22 countries, spoken by more than 400 million speakers. Each one of this country use at least on dialect for daily life conversation. Then, Arabic has at least 22 dialects. Each dialect can be written in Arabic or Arabizi Scripts. The most recent researches focus on constructing a language model and a training corpus for each dialect, in each script. Following this technique means constructing 46 different resources (by including the Modern Standard Arabic, MSA) for handling only one language. In this paper, we extract ONE corpus, and we propose ONE algorithm to automatically construct ONE training corpus using ONE classification model architecture for sentiment analysis MSA and different dialects. After manually reviewing the training corpus, the obtained results outperform all the research literature results for the targeted test corpora.
2018
pdf
abs
Arabizi sentiment analysis based on transliteration and automatic corpus annotation
Imane Guellil
|
Ahsan Adeel
|
Faical Azouaou
|
Fodil Benali
|
Ala-eddine Hachani
|
Amir Hussain
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Arabizi is a form of writing Arabic text which relies on Latin letters, numerals and punctuation rather than Arabic letters. In the literature, the difficulties associated with Arabizi sentiment analysis have been underestimated, principally due to the complexity of Arabizi. In this paper, we present an approach to automatically classify sentiments of Arabizi messages into positives or negatives. In the proposed approach, Arabizi messages are first transliterated into Arabic. Afterwards, we automatically classify the sentiment of the transliterated corpus using an automatically annotated corpus. For corpus validation, shallow machine learning algorithms such as Support Vectors Machine (SVM) and Naive Bays (NB) are used. Simulations results demonstrate the outperformance of NB algorithm over all others. The highest achieved F1-score is up to 78% and 76% for manually and automatically transliterated dataset respectively. Ongoing work is aimed at improving the transliterator module and annotated sentiment dataset.
2017
pdf
Une approche fondée sur les lexiques d’analyse de sentiments du dialecte algérien [A lexicon-based approach for sentiment analysis in the Algerian dialect]
Imane Guellil
|
Faical Azouaou
|
Houda Saâdane
|
Nasredine Semmar
Traitement Automatique des Langues, Volume 58, Numéro 3 : Traitement automatique de l'arabe et des langues apparentées [NLP for Arabic and Related Languages]