XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Emily Öhman; Marc Pàmies; Kaisla Kajava; Jörg Tiedemann

doi:10.18653/v1/2020.coling-main.575

XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection

Emily Öhman, Marc Pàmies, Kaisla Kajava, Jörg Tiedemann

Abstract

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

Anthology ID:: 2020.coling-main.575
Volume:: Proceedings of the 28th International Conference on Computational Linguistics
Month:: December
Year:: 2020
Address:: Barcelona, Spain (Online)
Editors:: Donia Scott, Nuria Bel, Chengqing Zong
Venue:: COLING
SIG:
Publisher:: International Committee on Computational Linguistics
Note:
Pages:: 6542–6552
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.coling-main.575/
DOI:: 10.18653/v1/2020.coling-main.575
Bibkey:
Cite (ACL):: Emily Öhman, Marc Pàmies, Kaisla Kajava, and Jörg Tiedemann. 2020. XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6542–6552, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):: XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection (Öhman et al., COLING 2020)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2020.coling-main.575.pdf
Code: Helsinki-NLP/XED
Data: XED, GoEmotions

PDF Cite Search Code Fix data