Marwan Torki

2023

pdf abs
AlexU-AIC at WojoodNER shared task: Sequence Labeling vs MRC and SWA for Arabic Named Entity Recognition
Shereen Elkordi | Noha Adly | Marwan Torki
Proceedings of ArabicNLP 2023

Named entity recognition (NER) is one of many challenging tasks in Arabic Natural Language Processing. It is also the base of many critical downstream tasks to help understand the source of major trends and public opinion. In this paper, we will describe our submission in the NER Shared Task of ArabicNLP 2023. We used a simple machine reading comprehension-based technique in the Flat NER Subtask ranking eighth on the leaderboard, while we fine-tuned a language model for the Nested NER Subtask ranking third on the leaderboard.

pdf abs
Alex-U 2023 NLP at WojoodNER shared task: AraBINDER (Bi-encoder for Arabic Named Entity Recognition)
Mariam Hussein | Sarah Khaled | Marwan Torki | Nagwa El-Makky
Proceedings of ArabicNLP 2023

Named Entity Recognition (NER) is a crucial task in natural language processing that facilitates the extraction of vital information from text. However, NER for Arabic presents a significant challenge due to the language’s unique characteristics. In this paper, we introduce AraBINDER, our submission to the Wojood NER Shared Task 2023 (ArabicNLP 2023). The shared task comprises two sub-tasks: sub-task 1 focuses on Flat NER, while sub-task 2 centers on Nested NER. We have participated in both sub-tasks. The Bi-Encoder has proven its efficiency for NER in English. We employ AraBINDER (Arabic Bi-Encoder for Named Entity Recognition), which uses the power of two transformer encoders and employs contrastive learning to map candidate text spans and entity types into the same vector representation space. This approach frames NER as a representation learning problem that maximizes the similarity between the vector representations of an entity mention and its type. Our experiments reveal that AraBINDER achieves a micro F-1 score of 0.918 for Flat NER and 0.9 for Nested NER on the Wojood dataset.

2022

pdf abs
Arabic Dialect Identification with a Few Labeled Examples Using Generative Adversarial Networks
Mahmoud Yusuf | Marwan Torki | Nagwa El-Makky
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Given the challenges and complexities introduced while dealing with Dialect Arabic (DA) variations, Transformer based models, e.g., BERT, outperformed other models in dealing with the DA identification task. However, to fine-tune these models, a large corpus is required. Getting a large number high quality labeled examples for some Dialect Arabic classes is challenging and time-consuming. In this paper, we address the Dialect Arabic Identification task. We extend the transformer-based models, ARBERT and MARBERT, with unlabeled data in a generative adversarial setting using Semi-Supervised Generative Adversarial Networks (SS-GAN). Our model enabled producing high-quality embeddings for the Dialect Arabic examples and aided the model to better generalize for the downstream classification task given few labeled examples. Experimental results showed that our model reached better performance and faster convergence when only a few labeled examples are available.

pdf abs
AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify
Ahmad Shapiro | Ayman Khalafallah | Marwan Torki
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection

Online presence on social media platforms such as Facebook and Twitter has become a daily habit for internet users. Despite the vast amount of services the platforms offer for their users, users suffer from cyber-bullying, which further leads to mental abuse and may escalate to cause physical harm to individuals or targeted groups. In this paper, we present our submission to the Arabic Hate Speech 2022 Shared Task Workshop (OSACT5 2022) using the associated Arabic Twitter dataset. The Shared Task consists of 3 Sub-tasks, Sub-task A focuses on detecting whether the tweet is Offensive or not. Then, For offensive Tweets, Sub-task B focuses on detecting whether the tweet is Hate Speech or not. Finally, For Hate Speech Tweets, Sub-task C focuses on detecting the fine-grained type of hate speech among six different classes. Transformer models proved their efficiency in classification tasks, but with the problem of over-fitting when fine-tuned on a small or an imbalanced dataset. We overcome this limitation by investigating multiple training paradigms such as Contrastive learning and Multi-task learning along with classification fine-tuning and an ensemble of our top 5 performers. Our proposed solution achieved 0.841, 0.817, and 0.476 macro F1-average in sub-tasks A, B, and C respectively.

pdf abs
AraDepSu: Detecting Depression and Suicidal Ideation in Arabic Tweets Using Transformers
Mariam Hassib | Nancy Hossam | Jolie Sameh | Marwan Torki
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

Among mental health diseases, depression is one of the most severe, as it often leads to suicide which is the fourth leading cause of death in the Middle East. In the Middle East, Egypt has the highest percentage of suicidal deaths; due to this, it is important to identify depression and suicidal ideation. In Arabic culture, there is a lack of awareness regarding the importance of diagnosing and living with mental health diseases. However, as noted for the last couple years people all over the world, including Arab citizens, tend to express their feelings openly on social media. Twitter is the most popular platform designed to enable the expression of emotions through short texts, pictures, or videos. This paper aims to predict depression and depression with suicidal ideation. Due to the tendency of people to treat social media as their personal diaries and share their deepest thoughts on social media platforms. Social data contain valuable information that can be used to identify user’s psychological states. We create AraDepSu dataset by scrapping tweets from twitter and manually labelling them. We expand the diversity of user tweets, by adding a neutral label (“neutral”) so the dataset include three classes (“depressed”, “suicidal”, “neutral”). Then we train our AraDepSu dataset on 30+ different transformer models. We find that the best-performing model is MARBERT with accuracy, precision, recall and F1-Score values of 91.20%, 88.74%, 88.50% and 88.75%.

pdf abs
AlexU-AL at SemEval-2022 Task 6: Detecting Sarcasm in Arabic Text Using Deep Learning Techniques
Aya Lotfy | Marwan Torki | Nagwa El-Makky
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

Sarcasm detection is an important task in Natural Language Understanding. Sarcasm is a form of verbal irony that occurs when there is a discrepancy between the literal and intended meanings of an expression. In this paper, we use the tweets of the Arabic dataset provided by SemEval-2022 task 6 to train deep learning classifiers to solve the sub-tasks A and C associated with the dataset. Sub-task A is to determine if the tweet is sarcastic or not. For sub-task C, given a sarcastic text and its non-sarcastic rephrase, i.e. two texts that convey the same meaning, determine which is the sarcastic one. In our solution, we utilize fine-tuned MARBERT (Abdul-Mageed et al., 2021) model with an added single linear layer on top for classification. The proposed solution achieved 0.5076 F1-sarcastic in Arabic sub-task A, accuracy of 0.7450 and F-score of 0.7442 in Arabic sub-task C. We achieved the 2^nd and the 9^th places for Arabic sub-tasks A and C respectively.

2020

pdf abs
Identifying Nuanced Dialect for Arabic Tweets with Deep Learning and Reverse Translation Corpus Extension System
Rawan Tahssin | Youssef Kishk | Marwan Torki
Proceedings of the Fifth Arabic Natural Language Processing Workshop

In this paper, we present our work for the NADI Shared Task (Abdul-Mageed and Habash, 2020): Nuanced Arabic Dialect Identification for Subtask-1: country-level dialect identification. We introduce a Reverse Translation Corpus Extension Systems (RTCES) to handle data imbalance along with reported results on several experimented approaches of word and document representations and different models architectures. The top scoring model was based on AraBERT (Antoun et al., 2020), with our modified extended corpus based on reverse translation of the given Arabic tweets. The selected system achieved a macro average F1 score of 20.34% on the test set, which places us as the 7th out of 18 teams in the final ranking Leaderboard.

pdf abs
Arabic Dialect Identification Using BERT Fine-Tuning
Moataz Mansour | Moustafa Tohamy | Zeyad Ezzat | Marwan Torki
Proceedings of the Fifth Arabic Natural Language Processing Workshop

In the last few years, deep learning has proved to be a very effective paradigm to discover patterns in large data sets. Unfortunately, deep learning training on small data sets is not the best option because most of the time traditional machine learning algorithms could get better scores. Now, we can train the neural network on a large data set then fine-tune on a smaller data set using the transfer learning technique. In this paper, we present our system for NADI shared Task: Country-level Dialect Identification, Our system is based on fine-tuning of BERT and it achieves 22.85 F1-score on Test Set and our rank is 5th out of 18 teams.

pdf abs
AlexU-AUX-BERT at SemEval-2020 Task 3: Improving BERT Contextual Similarity Using Multiple Auxiliary Contexts
Somaia Mahmoud | Marwan Torki
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes the system we built for SemEval-2020 task 3. That is predicting the scores of similarity for a pair of words within two different contexts. Our system is based on both BERT embeddings and WordNet. We simply use cosine similarity to find the closest synset of the target words. Our results show that using this simple approach greatly improves the system behavior. Our model is ranked 3rd in subtask-2 for SemEval-2020 task 3.

pdf abs
AlexU-BackTranslation-TL at SemEval-2020 Task 12: Improving Offensive Language Detection Using Data Augmentation and Transfer Learning
Mai Ibrahim | Marwan Torki | Nagwa El-Makky
Proceedings of the Fourteenth Workshop on Semantic Evaluation

Social media platforms, online news commenting spaces, and many other public forums have become widely known for issues of abusive behavior such as cyber-bullying and personal attacks. In this paper, we use the annotated tweets of the Offensive Language Identification Dataset (OLID) to train three levels of deep learning classifiers to solve the three sub-tasks associated with the dataset. Sub-task A is to determine if the tweet is toxic or not. Then, for offensive tweets, sub-task B requires determining whether the toxicity is targeted. Finally, for sub-task C, we predict the target of the offense; i.e. a group, individual, or other entity. In our solution, we tackle the problem of class imbalance in the dataset by using back translation for data augmentation and utilizing the fine-tuned BERT model in an ensemble of deep learning classifiers. We used this solution to participate in the three English sub-tasks of SemEval-2020 task 12. The proposed solution achieved 0.91393, 0.6300, and 0.57607 macro F1-average in sub-tasks A, B, and C respectively. We achieved the 9th, 14th, and 22nd places for sub-tasks A, B and C respectively.

2019

Studies on Dialectical Arabic are growing more important by the day as it becomes the primary written and spoken form of Arabic online in informal settings. Among the important problems that should be explored is that of dialect identification. This paper reports different techniques that can be applied towards such goal and reports their performance on the Multi Arabic Dialect Applications and Resources (MADAR) Arabic Dialect Corpora. Our results show that improving on traditional systems using frequency based features and non deep learning classifiers is a challenging task. We propose different models based on different word and document representations. Our top model is able to achieve an F1 macro averaged score of 65.66 on MADAR’s small-scale parallel corpus of 25 dialects and Modern Standard Arabic (MSA).

pdf abs
Question Answering Using Hierarchical Attention on Top of BERT Features
Reham Osama | Nagwa El-Makky | Marwan Torki
Proceedings of the 2nd Workshop on Machine Reading for Question Answering

The model submitted works as follows. When supplied a question and a passage it makes use of the BERT embedding along with the hierarchical attention model which consists of 2 parts, the co-attention and the self-attention, to locate a continuous span of the passage that is the answer to the question.

2018

pdf abs
A Document Descriptor using Covariance of Word Vectors
Marwan Torki
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper, we address the problem of finding a novel document descriptor based on the covariance matrix of the word vectors of a document. Our descriptor has a fixed length, which makes it easy to use in many supervised and unsupervised applications. We tested our novel descriptor in different tasks including supervised and unsupervised settings. Our evaluation shows that our document covariance descriptor fits different tasks with competitive performance against state-of-the-art methods.

2017

pdf abs
QU-BIGIR at SemEval 2017 Task 3: Using Similarity Features for Arabic Community Question Answering Forums
Marwan Torki | Maram Hasanain | Tamer Elsayed
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

In this paper we describe our QU-BIGIR system for the Arabic subtask D of the SemEval 2017 Task 3. Our approach builds on our participation in the past version of the same subtask. This year, our system uses different similarity measures that encodes lexical and semantic pairwise similarity of text pairs. In addition to well known similarity measures such as cosine similarity, we use other measures based on the summary statistics of word embedding representation for a given text. To rank a list of candidate question answer pairs for a given question, we learn a linear SVM classifier over our similarity features. Our best resulting run came second in subtask D with a very competitive performance to the first-ranking system.