We present a COVID-19 news dashboard which visualizes sentiment in pandemic news coverage in different languages across Europe. The dashboard shows analyses for positive/neutral/negative sentiment and moral sentiment for news articles across countries and languages. First we extract news articles from news-crawl. Then we use a pre-trained multilingual BERT model for sentiment analysis of news article headlines and a dictionary and word vectors -based method for moral sentiment analysis of news articles. The resulting dashboard gives a unified overview of news events on COVID-19 news overall sentiment, and the region and language of publication from the period starting from the beginning of January 2020 to the end of January 2021.
This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so-called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.
We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.
This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, Sentimentator, that can be used for efficient annotation based on crowd sourcing and a self-perpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and open-source and can easily be extended and applied for various purposes.