Nazanin Sabri


2021

pdf bib
EmoPars: A Collection of 30K Emotion-Annotated Persian Social Media Texts
Nazanin Sabri | Reyhane Akhavan | Behnam Bahrak
Proceedings of the Student Research Workshop Associated with RANLP 2021

The wide reach of social media platforms, such as Twitter, have enabled many users to share their thoughts, opinions and emotions on various topics online. The ability to detect these emotions automatically would allow social scientists, as well as, businesses to better understand responses from nations and costumers. In this study we introduce a dataset of 30,000 Persian Tweets labeled with Ekman’s six basic emotions (Anger, Fear, Happiness, Sadness, Hatred, and Wonder). This is the first publicly available emotion dataset in the Persian language. In this paper, we explain the data collection and labeling scheme used for the creation of this dataset. We also analyze the created dataset, showing the different features and characteristics of the data. Among other things, we investigate co-occurrence of different emotions in the dataset, and the relationship between sentiment and emotion of textual instances. The dataset is publicly available at https://github.com/nazaninsbr/Persian-Emotion-Detection.

pdf bib
UTNLP at SemEval-2021 Task 5: A Comparative Analysis of Toxic Span Detection using Attention-based, Named Entity Recognition, and Ensemble Models
Alireza Salemi | Nazanin Sabri | Emad Kebriaei | Behnam Bahrak | Azadeh Shakery
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

Detecting which parts of a sentence contribute to that sentence’s toxicity—rather than providing a sentence-level verdict of hatefulness— would increase the interpretability of models and allow human moderators to better understand the outputs of the system. This paper presents our team’s, UTNLP, methodology and results in the SemEval-2021 shared task 5 on toxic spans detection. We test multiple models and contextual embeddings and report the best setting out of all. The experiments start with keyword-based models and are followed by attention-based, named entity- based, transformers-based, and ensemble models. Our best approach, an ensemble model, achieves an F1 of 0.684 in the competition’s evaluation phase.

2019

pdf bib
Emad at SemEval-2019 Task 6: Offensive Language Identification using Traditional Machine Learning and Deep Learning approaches
Emad Kebriaei | Samaneh Karimi | Nazanin Sabri | Azadeh Shakery
Proceedings of the 13th International Workshop on Semantic Evaluation

In this paper, the used methods and the results obtained by our team, entitled Emad, on the OffensEval 2019 shared task organized at SemEval 2019 are presented. The OffensEval shared task includes three sub-tasks namely Offensive language identification, Automatic categorization of offense types and Offense target identification. We participated in sub-task A and tried various methods including traditional machine learning methods, deep learning methods and also a combination of the first two sets of methods. We also proposed a data augmentation method using word embedding to improve the performance of our methods. The results show that the augmentation approach outperforms other methods in terms of macro-f1.