Abstract
In Romanian language there are some resources for automatic text comprehension, but for Emotion Detection, not lexicon-based, there are none. To cover this gap, we extracted data from Twitter and created the first dataset containing tweets annotated with five types of emotions: joy, fear, sadness, anger and neutral, with the intent of being used for opinion mining and analysis tasks. In this article we present some features of our novel dataset, and create a benchmark to achieve the first supervised machine learning model for automatic Emotion Detection in Romanian short texts. We investigate the performance of four classical machine learning models: Multinomial Naive Bayes, Logistic Regression, Support Vector Classification and Linear Support Vector Classification. We also investigate more modern approaches like fastText, which makes use of subword information. Lastly, we fine-tune the Romanian BERT for text classification and our experiments show that the BERT-based model has the best performance for the task of Emotion Detection from Romanian tweets. Keywords: Emotion Detection, Twitter, Romanian, Supervised Machine Learning- Anthology ID:
- 2021.ranlp-1.34
- Volume:
- Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
- Month:
- September
- Year:
- 2021
- Address:
- Held Online
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd.
- Note:
- Pages:
- 291–300
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2021.ranlp-1.34/
- DOI:
- Cite (ACL):
- Alexandra Ciobotaru and Liviu P. Dinu. 2021. RED: A Novel Dataset for Romanian Emotion Detection from Tweets. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 291–300, Held Online. INCOMA Ltd..
- Cite (Informal):
- RED: A Novel Dataset for Romanian Emotion Detection from Tweets (Ciobotaru & Dinu, RANLP 2021)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2021.ranlp-1.34.pdf
- Data
- ISEAR