Behnam Sabeti


2020

pdf
Irony Detection in Persian Language: A Transfer Learning Approach Using Emoji Prediction
Preni Golazizian | Behnam Sabeti | Seyed Arad Ashrafi Asli | Zahra Majdabadi | Omid Momenzadeh | Reza Fahmi
Proceedings of the Twelfth Language Resources and Evaluation Conference

Irony is a linguistic device used to intend an idea while articulating an opposing expression. Many text analytic algorithms used for emotion extraction or sentiment analysis, produce invalid results due to the use of irony. Persian speakers use this device more often due to the language’s nature and some cultural reasons. This phenomenon also appears in social media platforms such as Twitter where users express their opinions using ironic or sarcastic posts. In the current research, which is the first attempt at irony detection in Persian language, emoji prediction is used to build a pretrained model. The model is finetuned utilizing a set of hand labeled tweets with irony tags. A bidirectional LSTM (BiLSTM) network is employed as the basis of our model which is improved by attention mechanism. Additionally, a Persian corpus for irony detection containing 4339 manually-labeled tweets is introduced. Experiments show the proposed approach outperforms the adapted state-of-the-art method tested on Persian dataset with an accuracy of 83.1%, and offers a strong baseline for further research in Persian language.

pdf
Optimizing Annotation Effort Using Active Learning Strategies: A Sentiment Analysis Case Study in Persian
Seyed Arad Ashrafi Asli | Behnam Sabeti | Zahra Majdabadi | Preni Golazizian | Reza Fahmi | Omid Momenzadeh
Proceedings of the Twelfth Language Resources and Evaluation Conference

Deep learning models are the current State-of-the-art methodologies towards many real-world problems. However, they need a substantial amount of labeled data to be trained appropriately. Acquiring labeled data can be challenging in some particular domains or less-resourced languages. There are some practical solutions regarding these issues, such as Active Learning and Transfer Learning. Active learning’s idea is simple: let the model choose the samples for annotation instead of labeling the whole dataset. This method leads to a more efficient annotation process. Active Learning models can achieve the baseline performance (the accuracy of the model trained on the whole dataset), with a considerably lower amount of labeled data. Several active learning approaches are tested in this work, and their compatibility with Persian is examined using a brand-new sentiment analysis dataset that is also introduced in this work. MirasOpinion, which to our knowledge is the largest Persian sentiment analysis dataset, is crawled from a Persian e-commerce website and annotated using a crowd-sourcing policy. LDA sampling, which is an efficient Active Learning strategy using Topic Modeling, is proposed in this research. Active Learning Strategies have shown promising results in the Persian language, and LDA sampling showed a competitive performance compared to other approaches.

pdf
Twitter Trend Extraction: A Graph-based Approach for Tweet and Hashtag Ranking, Utilizing No-Hashtag Tweets
Zahra Majdabadi | Behnam Sabeti | Preni Golazizian | Seyed Arad Ashrafi Asli | Omid Momenzadeh | Reza Fahmi
Proceedings of the Twelfth Language Resources and Evaluation Conference

Twitter has become a major platform for users to express their opinions on any topic and engage in debates. User debates and interactions usually lead to massive content regarding a specific topic which is called a Trend. Twitter trend extraction aims at finding these relevant groups of content that are generated in a short period. The most straightforward approach for this problem is using Hashtags, however, tweets without hashtags are not considered this way. In order to overcome this issue and extract trends using all tweets, we propose a graph-based approach where graph nodes represent tweets as well as words and hashtags. More specifically, we propose a modified version of RankClus algorithm to extract trends from the constructed tweets graph. The proposed approach is also capable of ranking tweets, words and hashtags in each trend with respect to their importance and relevance to the topic. The proposed algorithm is used to extract trends from several twitter datasets, where it produced consistent and coherent results.

2018

pdf
MirasText: An Automatically Generated Text Corpus for Persian
Behnam Sabeti | Hossein Abedi Firouzjaee | Ali Janalizadeh Choobbasti | S.H.E. Mortazavi Najafabadi | Amir Vaheb
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
MirasVoice: A bilingual (English-Persian) speech corpus
Amir Vaheb | Ali Janalizadeh Choobbasti | S.H.E. Mortazavi Najafabadi | Saeid Safavi | Behnam Sabeti
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)