Malak Abdullah


JUST-DEEP at SemEval-2022 Task 4: Using Deep Learning Techniques to Reveal Patronizing and Condescending Language
Mohammad Makahleh | Naba Bani Yaseen | Malak Abdullah
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Classification of language that favors or condones vulnerable communities (e.g., refugees, homeless, widows) has been considered a challenging task and a critical step in NLP applications. Moreover, the spread of this language among people and on social media harms society and harms the people concerned. Therefore, the classification of this language is considered a significant challenge for researchers in the world. In this paper, we propose JUST-DEEP architecture to classify a text and determine if it contains any form of patronizing and condescending language (Task 4- Subtask 1). The architecture uses state-of-art pre-trained models and empowers ensembling techniques that outperform the baseline (RoBERTa) in the SemEval-2022 task4 with a 0.502 F1 score.

YMAI at SemEval-2022 Task 5: Detecting Misogyny in Memes using VisualBERT and MMBT MultiModal Pre-trained Models
Mohammad Habash | Yahya Daqour | Malak Abdullah | Mahmoud Al-Ayyoub
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents a deep learning system that contends at SemEval-2022 Task 5. The goal is to detect the existence of misogynous memes in sub-task A. At the same time, the advanced multi-label sub-task B categorizes the misogyny of misogynous memes into one of four types: stereotype, shaming, objectification, and violence. The Ensemble technique has been used for three multi-modal deep learning models: two MMBT models and VisualBERT. Our proposed system ranked 17 place out of 83 participant teams with an F1-score of 0.722 in sub-task A, which shows a significant performance improvement over the baseline model’s F1-score of 0.65.

SarcasmDet at SemEval-2022 Task 6: Detecting Sarcasm using Pre-trained Transformers in English and Arabic Languages
Malak Abdullah | Dalya Alnore | Safa Swedat | Jumana Khrais | Mahmoud Al-Ayyoub
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper presents solution systems for task 6 at SemEval2022, iSarcasmEval: Intended Sarcasm Detection In English and Arabic. The shared task 6 consists of three sub-task. We participated in subtask A for both languages, Arabic and English. The goal of subtask A is to predict if a tweet would be considered sarcastic or not. The proposed solution SarcasmDet has been developed using the state-of-the-art Arabic and English pre-trained models AraBERT, MARBERT, BERT, and RoBERTa with ensemble techniques. The paper describes the SarcasmDet architecture with the fine-tuning of the best hyperparameter that led to this superior system. Our model ranked seventh out of 32 teams in subtask A- Arabic with an f1-sarcastic of 0.4305 and Seventeen out of 42 teams with f1-sarcastic 0.3561. However, we built another model to score f-1 sarcastic with 0.43 in English after the deadline. Both Models (Arabic and English scored 0.43 as f-1 sarcastic with ranking seventh).


SarcasmDet at Sarcasm Detection Task 2021 in Arabic using AraBERT Pretrained Model
Dalya Faraj | Dalya Faraj | Malak Abdullah
Proceedings of the Sixth Arabic Natural Language Processing Workshop

This paper presents one of the top five winning solutions for the Shared Task on Sarcasm and Sentiment Detection in Arabic (Subtask-1 Sarcasm Detection). The goal of the task is to identify whether a tweet is sarcastic or not. Our solution has been developed using ensemble technique with AraBERT pre-trained model. We describe the architecture of the submitted solution in the shared task. We also provide the experiments and the hyperparameter tuning that lead to this result. Besides, we discuss and analyze the results by comparing all the models that we trained or tested to achieve a better score in a table design. Our model is ranked fifth out of 27 teams with an F1 score of 0.5985. It is worth mentioning that our model achieved the highest accuracy score of 0.7830

R00 at NLP4IF-2021 Fighting COVID-19 Infodemic with Transformers and More Transformers
Ahmed Qarqaz | Dia Abujaber | Malak Abdullah
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

This paper describes the winning model in the Arabic NLP4IF shared task for fighting the COVID-19 infodemic. The goal of the shared task is to check disinformation about COVID-19 in Arabic tweets. Our proposed model has been ranked 1st with an F1-Score of 0.780 and an Accuracy score of 0.762. A variety of transformer-based pre-trained language models have been experimented with through this study. The best-scored model is an ensemble of AraBERT-Base, Asafya-BERT, and ARBERT models. One of the study’s key findings is showing the effect the pre-processing can have on every model’s score. In addition to describing the winning model, the current study shows the error analysis.

SarcasmDet at SemEval-2021 Task 7: Detect Humor and Offensive based on Demographic Factors using RoBERTa Pre-trained Model
Dalya Faraj | Malak Abdullah
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents one of the top winning solution systems for task 7 at SemEval2021, HaHackathon: Detecting and Rating Humor and Offense. This competition is divided into two tasks, task1 with three sub-tasks 1a,1b, and 1c, and task2. The goal for task1 is to predict if the text would be considered humorous or not, and if it is yes, then predict how humorous it is and whether the humor rating would be perceived as controversial. The goal of the task2 is to predict how the text is considered offensive for users in general. Our solution has been developed using RoBERTa pre-trained model with ensemble techniques. The paper describes the submitted solution system’s architecture with the experiments and the hyperparameter tuning that led to this robust system. Our model ranked third and fourth places out of 50 teams in tasks 1c and 1a with F1-Score of 0.6270 and 0.9675, respectively. At the same time, the model ranked one of the top 10 models in task 1b and task 2 with an RMSE scores of 0.5446 and 0.4469, respectively.

JUST-BLUE at SemEval-2021 Task 1: Predicting Lexical Complexity using BERT and RoBERTa Pre-trained Language Models
Tuqa Bani Yaseen | Qusai Ismail | Sarah Al-Omari | Eslam Al-Sobh | Malak Abdullah
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Predicting the complexity level of a word or a phrase is considered a challenging task. It is even recognized as a crucial step in numerous NLP applications, such as text rearrangements and text simplification. Early research treated the task as a binary classification task, where the systems anticipated the existence of a word’s complexity (complex versus uncomplicated). Other studies had been designed to assess the level of word complexity using regression models or multi-labeling classification models. Deep learning models show a significant improvement over machine learning models with the rise of transfer learning and pre-trained language models. This paper presents our approach that won the first rank in the SemEval-task1 (sub stask1). We have calculated the degree of word complexity from 0-1 within a text. We have been ranked first place in the competition using the pre-trained language models Bert and RoBERTa, with a Pearson correlation score of 0.788.


JUST System for WMT20 Chat Translation Task
Roweida Mohammed | Mahmoud Al-Ayyoub | Malak Abdullah
Proceedings of the Fifth Conference on Machine Translation

Machine Translation (MT) is a sub-field of Artificial Intelligence and Natural Language Processing that investigates and studies the ways of automatically translating a text from one language to another. In this paper, we present the details of our submission to the WMT20 Chat Translation Task, which consists of two language directions, English –> German and German –> English. The major feature of our system is applying a pre-trained BERT embedding with a bidirectional recurrent neural network. Our system ensembles three models, each with different hyperparameters. Despite being trained on a very small corpus, our model produces surprisingly good results.

JUSTMasters at SemEval-2020 Task 3: Multilingual Deep Learning Model to Predict the Effect of Context in Word Similarity
Nour Al-khdour | Mutaz Bni Younes | Malak Abdullah | Mohammad AL-Smadi
Proceedings of the Fourteenth Workshop on Semantic Evaluation

There is a growing research interest in studying word similarity. Without a doubt, two similar words in a context may considered different in another context. Therefore, this paper investigates the effect of the context in word similarity. The SemEval-2020 workshop has provided a shared task (Task 3: Predicting the (Graded) Effect of Context in Word Similarity). In this task, the organizers provided unlabeled datasets for four languages, English, Croatian, Finnish and Slovenian. Our team, JUSTMasters, has participated in this competition in the two subtasks: A and B. Our approach has used a weighted average ensembling method for different pretrained embeddings techniques for each of the four languages. Our proposed model outperformed the baseline models in both subtasks and acheived the best result for subtask 2 in English and Finnish, with score 0.725 and 0.68 respectively. We have been ranked the sixth for subtask 1, with scores for English, Croatian, Finnish, and Slovenian as follows: 0.738, 0.44, 0.546, 0.512.

TeamJUST at SemEval-2020 Task 4: Commonsense Validation and Explanation Using Ensembling Techniques
Roweida Mohammed | Malak Abdullah
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Common sense for natural language processing methods has been attracting a wide research interest, recently. Estimating automatically whether a sentence makes sense or not is considered an essential question. Task 4 in the International Workshop SemEval 2020 has provided three subtasks (A, B, and C) that challenges the participants to build systems for distinguishing the common sense statements from those that do not make sense. This paper describes TeamJUST’s approach for participating in subtask A to differentiate between two sentences in English and classify them into two classes: common sense and uncommon sense statements. Our approach depends on ensembling four different state-of-the-art pre-trained models (BERT, ALBERT, Roberta, and XLNet). Our baseline model which we used only the pre-trained model of BERT has scored 89.1, while the TeamJUST model outperformed the baseline model with an accuracy score of 96.2. We have improved the results in the post-evaluation period to achieve our best result, which would rank the 4th in the competition if we had the chance to use our latest experiment.

MLEngineer at SemEval-2020 Task 7: BERT-Flair Based Humor Detection Model (BFHumor)
Fara Shatnawi | Malak Abdullah | Mahmoud Hammad
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Task 7, Assessing the Funniness of Edited News Headlines, in the International Workshop SemEval2020 introduces two sub-tasks to predict the funniness values of edited news headlines from the Reddit website. This paper proposes the BFHumor model of the MLEngineer team that participates in both sub-tasks in this competition. The BFHumor’s model is defined as a BERT-Flair based humor detection model that is a combination of different pre-trained models with various Natural Language Processing (NLP) techniques. The Bidirectional Encoder Representations from Transformers (BERT) regressor is considered the primary pre-trained model in our approach, whereas Flair is the main NLP library. It is worth mentioning that the BFHumor model has been ranked 4th in sub-task1 with a root mean square error (RMSE) value of 0.51966, and it is 0.02 away from the first ranked model. Also, the team is ranked 12th in the sub-task2 with an accuracy of 0.62291, which is 0.05 away from the top-ranked model. Our results indicate that the BFHumor model is one of the top models for detecting humor in the text.

JUST at SemEval-2020 Task 11: Detecting Propaganda Techniques Using BERT Pre-trained Model
Ola Altiti | Malak Abdullah | Rasha Obiedat
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents the submission to semeval-2020 task 11, Detection of Propaganda Techniques in News Articles. Knowing that there are two subtasks in this competition, we have participated in the Technique Classification subtask (TC), which aims to identify the propaganda techniques used in a specific propaganda span. We have used and implemented various models to detect propaganda. Our proposed model is based on BERT uncased pre-trained language model as it has achieved state-of-the-art performance on multiple NLP benchmarks. The performance results of our proposed model have scored 0.55307 F1-Score, which outperforms the baseline model provided by the organizers with 0.2519 F1-Score, and our model is 0.07 away from the best performing team. Compared to other participating systems, our submission is ranked 15th out of 31 participants.


EmoDet at SemEval-2019 Task 3: Emotion Detection in Text using Deep Learning
Hani Al-Omari | Malak Abdullah | Nabeel Bassam
Proceedings of the 13th International Workshop on Semantic Evaluation

Task 3, EmoContext, in the International Workshop SemEval 2019 provides training and testing datasets for the participant teams to detect emotion classes (Happy, Sad, Angry, or Others). This paper proposes a participating system (EmoDet) to detect emotions using deep learning architecture. The main input to the system is a combination of Word2Vec word embeddings and a set of semantic features (e.g. from AffectiveTweets Weka-package). The proposed system (EmoDet) ensembles a fully connected neural network architecture and LSTM neural network to obtain performance results that show substantial improvements (F1-Score 0.67) over the baseline model provided by Task 3 organizers (F1-score 0.58).

JUSTDeep at NLP4IF 2019 Task 1: Propaganda Detection using Ensemble Deep Learning Models
Hani Al-Omari | Malak Abdullah | Ola AlTiti | Samira Shaikh
Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda

The internet and the high use of social media have enabled the modern-day journalism to publish, share and spread news that is difficult to distinguish if it is true or fake. Defining “fake news” is not well established yet, however, it can be categorized under several labels: false, biased, or framed to mislead the readers that are characterized as propaganda. Digital content production technologies with logical fallacies and emotional language can be used as propaganda techniques to gain more readers or mislead the audience. Recently, several researchers have proposed deep learning (DL) models to address this issue. This research paper provides an ensemble deep learning model using BiLSTM, XGBoost, and BERT to detect propaganda. The proposed model has been applied on the dataset provided by the challenge NLP4IF 2019, Task 1 Sentence Level Classification (SLC) and it shows a significant performance over the baseline model.


TeamUNCC at SemEval-2018 Task 1: Emotion Detection in English and Arabic Tweets using Deep Learning
Malak Abdullah | Samira Shaikh
Proceedings of the 12th International Workshop on Semantic Evaluation

Task 1 in the International Workshop SemEval 2018, Affect in Tweets, introduces five subtasks (El-reg, El-oc, V-reg, V-oc, and E-c) to detect the intensity of emotions in English, Arabic, and Spanish tweets. This paper describes TeamUNCC’s system to detect emotions in English and Arabic tweets. Our approach is novel in that we present the same architecture for all the five subtasks in both English and Arabic. The main input to the system is a combination of word2vec and doc2vec embeddings and a set of psycholinguistic features (e.g. from AffectTweets Weka-package). We apply a fully connected neural network architecture and obtain performance results that show substantial improvements in Spearman correlation scores over the baseline models provided by Task 1 organizers, (ranging from 0.03 to 0.23). TeamUNCC’s system ranks third in subtask El-oc and fourth in other subtasks for Arabic tweets.