2021
pdf
abs
DamascusTeam at NLP4IF2021: Fighting the Arabic COVID-19 Infodemic on Twitter Using AraBERT
Ahmad Hussein
|
Nada Ghneim
|
Ammar Joukhadar
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
The objective of this work was the introduction of an effective approach based on the AraBERT language model for fighting Tweets COVID-19 Infodemic. It was arranged in the form of a two-step pipeline, where the first step involved a series of pre-processing procedures to transform Twitter jargon, including emojis and emoticons, into plain text, and the second step exploited a version of AraBERT, which was pre-trained on plain text, to fine-tune and classify the tweets with respect to their Label. The use of language models pre-trained on plain texts rather than on tweets was motivated by the necessity to address two critical issues shown by the scientific literature, namely (1) pre-trained language models are widely available in many languages, avoiding the time-consuming and resource-intensive model training directly on tweets from scratch, allowing to focus only on their fine-tuning; (2) available plain text corpora are larger than tweet-only ones, allowing for better performance.
2020
pdf
abs
Arabic Offensive Language Detection with Attention-based Deep Neural Networks
Bushr Haddad
|
Zoher Orabe
|
Anas Al-Abood
|
Nada Ghneim
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection
In this paper, we tackle the problem of offensive language and hate speech detection. We proposed our methods for data preprocessing and balancing, and then we presented our Convolutional Neural Network (CNN) and bidirectional Gated Recurrent Unit (GRU) models used. After that, we augmented these models with attention layer. The best results achieved was using the Bidirectional Gated Recurrent Unit augmented with attention layer (Bi-GRU_ATT). Keywords: Abusive Language, Text Mining, Arabic Language, Social Media Mining, Deep Learning, Convolutional Neural Network, Gated Recurrent Unit, Attention Mechanism, Machine Learning.
pdf
abs
DoTheMath at SemEval-2020 Task 12 : Deep Neural Networks with Self Attention for Arabic Offensive Language Detection
Zoher Orabe
|
Bushr Haddad
|
Nada Ghneim
|
Anas Al-Abood
Proceedings of the Fourteenth Workshop on Semantic Evaluation
This paper describes our team work and submission for the SemEval 2020 (Sub-Task A) “Offensive Eval: Identifying and Categorizing Offensive Arabic Language in Arabic Social Media”. Our two baseline models were based on different levels of representation: character vs. word level. In word level based representation we implemented a convolutional neural network model and a bi-directional GRU model. In character level based representation we implemented a hyper CNN and LSTM model. All of these models have been further augmented with attention layers for a better performance on our task. We also experimented with three types of static word embeddings: word2vec, FastText, and Glove, in addition to emoji embeddings, and compared the performance of the different deep learning models on the dataset provided by this task. The bi-directional GRU model with attention has achieved the highest score (0.85% F1 score) among all other models.
2019
pdf
Arabic Dialogue Act Recognition for Textual Chatbot Systems
Alaa Joukhadar
|
Huda Saghergy
|
Leen Kweider
|
Nada Ghneim
Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers
1995
pdf
bib
Optimising Tools for the French Letter-to-Phone Grammar TOPH With a View to Phonographic Spelling Correction
Nada Ghneim
|
Véronique Aubergé
ROCLING 1995 Poster Papers