EmoPars: A Collection of 30K Emotion-Annotated Persian Social Media Texts

Nazanin Sabri, Reyhane Akhavan, Behnam Bahrak


Abstract
The wide reach of social media platforms, such as Twitter, have enabled many users to share their thoughts, opinions and emotions on various topics online. The ability to detect these emotions automatically would allow social scientists, as well as, businesses to better understand responses from nations and costumers. In this study we introduce a dataset of 30,000 Persian Tweets labeled with Ekman’s six basic emotions (Anger, Fear, Happiness, Sadness, Hatred, and Wonder). This is the first publicly available emotion dataset in the Persian language. In this paper, we explain the data collection and labeling scheme used for the creation of this dataset. We also analyze the created dataset, showing the different features and characteristics of the data. Among other things, we investigate co-occurrence of different emotions in the dataset, and the relationship between sentiment and emotion of textual instances. The dataset is publicly available at https://github.com/nazaninsbr/Persian-Emotion-Detection.
Anthology ID:
2021.ranlp-srw.23
Volume:
Proceedings of the Student Research Workshop Associated with RANLP 2021
Month:
September
Year:
2021
Address:
Online
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
167–173
Language:
URL:
https://aclanthology.org/2021.ranlp-srw.23
DOI:
Bibkey:
Cite (ACL):
Nazanin Sabri, Reyhane Akhavan, and Behnam Bahrak. 2021. EmoPars: A Collection of 30K Emotion-Annotated Persian Social Media Texts. In Proceedings of the Student Research Workshop Associated with RANLP 2021, pages 167–173, Online. INCOMA Ltd..
Cite (Informal):
EmoPars: A Collection of 30K Emotion-Annotated Persian Social Media Texts (Sabri et al., RANLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.ranlp-srw.23.pdf
Code
 nazaninsbr/persian-emotion-detection
Data
EmoPars