Samhaa R. El-Beltagy

Also published as: Samhaa El-Beltagy

2020

pdf abs
ASU_OPTO at OSACT4 - Offensive Language Detection for Arabic text
Amr Keleg | Samhaa R. El-Beltagy | Mahmoud Khalil
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection

In the past years, toxic comments and offensive speech are polluting the internet and manual inspection of these comments is becoming a tiresome task to manage. Having a machine learning based model that is able to filter offensive Arabic content is of high need nowadays. In this paper, we describe the model that was submitted to the Shared Task on Offensive Language Detection that is organized by (The 4th Workshop on Open-Source Arabic Corpora and Processing Tools). Our model makes use transformer based model (BERT) to detect offensive content. We came in the fourth place in subtask A (detecting Offensive Speech) and in the third place in subtask B (detecting Hate Speech).

2017

pdf abs
NileTMRG at SemEval-2017 Task 8: Determining Rumour and Veracity Support for Rumours on Twitter.
Omar Enayet | Samhaa R. El-Beltagy
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

Final submission for NileTMRG on RumourEval 2017.

pdf abs
NileTMRG at SemEval-2017 Task 4: Arabic Sentiment Analysis
Samhaa R. El-Beltagy | Mona El Kalamawy | Abu Bakr Soliman
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes two systems that were used by the NileTMRG for addressing Arabic Sentiment Analysis as part of SemEval-2017, task 4. NileTMRG participated in three Arabic related subtasks which are: Subtask A (Message Polarity Classification), Subtask B (Topic-Based Message Polarity classification) and Subtask D (Tweet quantification). For subtask A, we made use of NU’s sentiment analyzer which we augmented with a scored lexicon. For subtasks B and D, we used an ensemble of three different classifiers. The first classifier was a convolutional neural network that used trained (word2vec) word embeddings. The second classifier consisted of a MultiLayer Perceptron while the third classifier was a Logistic regression model that takes the same input as the second classifier. Voting between the three classifiers was used to determine the final outcome. In all three Arabic related tasks in which NileTMRG participated, the team ranked at number one.

2016

pdf
Bilingual Embeddings and Word Alignments for Translation Quality Estimation
Amal Abdelsalam | Ondřej Bojar | Samhaa El-Beltagy
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
NileTMRG at SemEval-2016 Task 5: Deep Convolutional Neural Networks for Aspect Category and Sentiment Extraction
Talaat Khalil | Samhaa R. El-Beltagy
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
NileTMRG at SemEval-2016 Task 7: Deriving Prior Polarities for Arabic Sentiment Terms
Samhaa R. El-Beltagy
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf abs
NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic
Samhaa R. El-Beltagy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents NileULex, which is an Arabic sentiment lexicon containing close to six thousands Arabic words and compound phrases. Forty five percent of the terms and expressions in the lexicon are Egyptian or colloquial while fifty five percent are Modern Standard Arabic. While the collection of many of the terms included in the lexicon was done automatically, the actual addition of any term was done manually. One of the important criterions for adding terms to the lexicon, was that they be as unambiguous as possible. The result is a lexicon with a much higher quality than any translated variant or automatically constructed one. To demonstrate that a lexicon such as this can directly impact the task of sentiment analysis, a very basic machine learning based sentiment analyser that uses unigrams, bigrams, and lexicon based features was applied on two different Twitter datasets. The obtained results were compared to a baseline system that only uses unigrams and bigrams. The same lexicon based features were also generated using a publicly available translation of a popular sentiment lexicon. The experiments show that usage of the developed lexicon improves the results over both the baseline and the publicly available lexicon.