2019
pdf
abs
Syntax-Ignorant N-gram Embeddings for Sentiment Analysis of Arabic Dialects
Hala Mulki
|
Hatem Haddad
|
Mourad Gridach
|
Ismail Babaoğlu
Proceedings of the Fourth Arabic Natural Language Processing Workshop
Arabic sentiment analysis models have employed compositional embedding features to represent the Arabic dialectal content. These embeddings are usually composed via ordered, syntax-aware composition functions and learned within deep neural frameworks. With the free word order and the varying syntax nature across the different Arabic dialects, a sentiment analysis system developed for one dialect might not be efficient for the others. Here we present syntax-ignorant n-gram embeddings to be used in sentiment analysis of several Arabic dialects. The proposed embeddings were composed and learned using an unordered composition function and a shallow neural model. Five datasets of different dialects were used to evaluate the produced embeddings in the sentiment analysis task. The obtained results revealed that, our syntax-ignorant embeddings could outperform word2vec model and doc2vec both variant models in addition to hand-crafted system baselines, while a competent performance was noticed towards baseline systems that adopted more complicated neural architectures.
2017
pdf
abs
Churn Identification in Microblogs using Convolutional Neural Networks with Structured Logical Knowledge
Mourad Gridach
|
Hatem Haddad
|
Hala Mulki
Proceedings of the 3rd Workshop on Noisy User-generated Text
For brands, gaining new customer is more expensive than keeping an existing one. Therefore, the ability to keep customers in a brand is becoming more challenging these days. Churn happens when a customer leaves a brand to another competitor. Most of the previous work considers the problem of churn prediction using the Call Detail Records (CDRs). In this paper, we use micro-posts to classify customers into churny or non-churny. We explore the power of convolutional neural networks (CNNs) since they achieved state-of-the-art in various computer vision and NLP applications. However, the robustness of end-to-end models has some limitations such as the availability of a large amount of labeled data and uninterpretability of these models. We investigate the use of CNNs augmented with structured logic rules to overcome or reduce this issue. We developed our system called Churn_teacher by using an iterative distillation method that transfers the knowledge, extracted using just the combination of three logic rules, directly into the weight of the DNNs. Furthermore, we used weight normalization to speed up training our convolutional neural networks. Experimental results showed that with just these three rules, we were able to get state-of-the-art on publicly available Twitter dataset about three Telecom brands.
pdf
abs
Tw-StAR at SemEval-2017 Task 4: Sentiment Classification of Arabic Tweets
Hala Mulki
|
Hatem Haddad
|
Mourad Gridach
|
Ismail Babaoglu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
In this paper, we present our contribution in SemEval 2017 international workshop. We have tackled task 4 entitled “Sentiment analysis in Twitter”, specifically subtask 4A-Arabic. We propose two Arabic sentiment classification models implemented using supervised and unsupervised learning strategies. In both models, Arabic tweets were preprocessed first then various schemes of bag-of-N-grams were extracted to be used as features. The final submission was selected upon the best performance achieved by the supervised learning-based model. However, the results obtained by the unsupervised learning-based model are considered promising and evolvable if more rich lexica are adopted in further work.
2016
pdf
abs
Character-Aware Neural Networks for Arabic Named Entity Recognition for Social Media
Mourad Gridach
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
Named Entity Recognition (NER) is the task of classifying or labelling atomic elements in the text into categories such as Person, Location or Organisation. For Arabic language, recognizing named entities is a challenging task because of the complexity and the unique characteristics of this language. In addition, most of the previous work focuses on Modern Standard Arabic (MSA), however, recognizing named entities in social media is becoming more interesting these days. Dialectal Arabic (DA) and MSA are both used in social media, which is deemed as another challenging task. Most state-of-the-art Arabic NER systems count heavily on handcrafted engineering features and lexicons which is time consuming. In this paper, we introduce a novel neural network architecture which benefits both from character- and word-level representations automatically, by using combination of bidirectional LSTM and Conditional Random Field (CRF), eliminating the need for most feature engineering. Moreover, our model relies on unsupervised word representations learned from unannotated corpora. Experimental results demonstrate that our model achieves state-of-the-art performance on publicly available benchmark for Arabic NER for social media and surpassing the previous system by a large margin.
2011
pdf
Developing a New System for Arabic Morphological Analysis and Generation
Mourad Gridach
|
Noureddine Chenfour
Proceedings of the 2nd Workshop on South Southeast Asian Natural Language Processing (WSSANLP)