Hamada Nayel

2022

pdf abs
BoNC: Bag of N-Characters Model for Word Level Language Identification
Shimaa Ismail | Mai K. Gallab | Hamada Nayel
Proceedings of the 19th International Conference on Natural Language Processing (ICON): Shared Task on Word Level Language Identification in Code-mixed Kannada-English Texts

This paper describes the model submitted by NLP_BFCAI team for Kanglish shared task held at ICON 2022. The proposed model used a very simple approach based on the word representation. Simple machine learning classification algorithms, Random Forests, Support Vector Machines, Stochastic Gradient Descent and Multi-Layer Perceptron have been imple- mented. Our submission, RF, securely ranked fifth among all other submissions.

pdf abs
Word Representation Models for Arabic Dialect Identification
Mahmoud Sobhy | Ahmed H. Abu El-Atta | Ahmed A. El-Sawy | Hamada Nayel
Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)

This paper describes the systems submitted by BFCAI team to Nuanced Arabic Dialect Identification (NADI) shared task 2022. Dialect identification task aims at detecting the source variant of a given text or speech segment automatically. There are two subtasks in NADI 2022, the first subtask for country-level identification and the second subtask for sentiment analysis. Our team participated in the first subtask. The proposed systems use Term Frequency Inverse/Document Frequency and word embeddings as vectorization models. Different machine learning algorithms have been used as classifiers. The proposed systems have been tested on two test sets: Test-A and Test-B. The proposed models achieved Macro-f1 score of 21.25% and 9.71% for Test-A and Test-B set respectively. On other hand, the best-performed submitted system achieved Macro-f1 score of 36.48% and 18.95% for Test-A and Test-B set respectively.

pdf abs
BFCAI at SemEval-2022 Task 6: Multi-Layer Perceptron for Sarcasm Detection in Arabic Texts
Nsrin Ashraf | Fathy Elkazzaz | Mohamed Taha | Hamada Nayel | Tarek Elshishtawy
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper describes the systems submitted to iSarcasm shared task. The aim of iSarcasm is to identify the sarcastic contents in Arabic and English text. Our team participated in iSarcasm for the Arabic language. A multi-Layer machine learning based model has been submitted for Arabic sarcasm detection. In this model, a vector space TF-IDF has been used as for feature representation. The submitted system is simple and does not need any external resources. The test results show encouraging results.

pdf abs
NAYEL @LT-EDI-ACL2022: Homophobia/Transphobia Detection for Equality, Diversity, and Inclusion using SVM
Nsrin Ashraf | Mohamed Taha | Ahmed Abd Elfattah | Hamada Nayel
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

Analysing the contents of social media platforms such as YouTube, Facebook and Twitter gained interest due to the vast number of users. One of the important tasks is homophobia/transphobia detection. This paper illustrates the system submitted by our team for the homophobia/transphobia detection in social media comments shared task. A machine learning-based model has been designed and various classification algorithms have been implemented for automatic detection of homophobia in YouTube comments. TF/IDF has been used with a range of bigram model for vectorization of comments. Support Vector Machines has been used to develop the proposed model and our submission reported 0.91, 0.92, 0.88 weighted f1-score for English, Tamil and Tamil-English datasets respectively.

2021

pdf abs
Machine Learning-Based Approach for Arabic Dialect Identification
Hamada Nayel | Ahmed Hassan | Mahmoud Sobhi | Ahmed El-Sawy
Proceedings of the Sixth Arabic Natural Language Processing Workshop

This paper describes our systems submitted to the Second Nuanced Arabic Dialect Identification Shared Task (NADI 2021). Dialect identification is the task of automatically detecting the source variety of a given text or speech segment. There are four subtasks, two subtasks for country-level identification and the other two subtasks for province-level identification. The data in this task covers a total of 100 provinces from all 21 Arab countries and come from the Twitter domain. The proposed systems depend on five machine-learning approaches namely Complement Naïve Bayes, Support Vector Machine, Decision Tree, Logistic Regression and Random Forest Classifiers. F1 macro-averaged score of Naïve Bayes classifier outperformed all other classifiers for development and test data.

pdf abs
Machine Learning-Based Model for Sentiment and Sarcasm Detection
Hamada Nayel | Eslam Amer | Aya Allam | Hanya Abdallah
Proceedings of the Sixth Arabic Natural Language Processing Workshop

Within the last few years, the number of Arabic internet users and Arabic online content is in exponential growth. Dealing with Arabic datasets and the usage of non-explicit sentences to express an opinion are considered to be the major challenges in the field of natural language processing. Hence, sarcasm and sentiment analysis has gained a major interest from the research community, especially in this language. Automatic sarcasm detection and sentiment analysis can be applied using three approaches, namely supervised, unsupervised and hybrid approach. In this paper, a model based on a supervised machine learning algorithm called Support Vector Machine (SVM) has been used for this process. The proposed model has been evaluated using ArSarcasm-v2 dataset. The performance of the proposed model has been compared with other models submitted to sentiment analysis and sarcasm detection shared task.

pdf abs
BFCAI at ComMA@ICON 2021: Support Vector Machines for Multilingual Gender Biased and Communal Language Identification
Fathy Elkazzaz | Fatma Sakr | Rasha Orban | Hamada Nayel
Proceedings of the 18th International Conference on Natural Language Processing: Shared Task on Multilingual Gender Biased and Communal Language Identification

This paper presents the system that has been submitted to the multilingual gender biased and communal language identification shared task by BFCAI team. The proposed model used Support Vector Machines (SVMs) as a classification algorithm. The features have been extracted using TF/IDF model with unigram and bigram. The proposed model is very simple and there are no external resources are needed to build the model.

2020

pdf abs
NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets
Hamada Nayel
Proceedings of the Fourteenth Workshop on Semantic Evaluation

In this paper, we present the system submitted to “SemEval-2020 Task 12”. The proposed system aims at automatically identify the Offensive Language in Arabic Tweets. A machine learning based approach has been used to design our system. We implemented a linear classifier with Stochastic Gradient Descent (SGD) as optimization algorithm. Our model reported 84.20%, 81.82% f1-score on development set and test set respectively. The best performed system and the system in the last rank reported 90.17% and 44.51% f1-score on test set respectively.