Emily Öhman


pdf bib
Computational Exploration of the Origin of Mood in Literary Texts
Emily Öhman | Riikka H. Rossi
Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities

This paper is a methodological exploration of the origin of mood in early modern and modern Finnish literary texts using computational methods. We discuss the pre-processing steps as well as the various natural language processing tools used to try to pinpoint where mood can be best detected in text. We also share several tools and resources developed during this process. Our early attempts suggest that overall mood can be computationally detected in the first three paragraphs of a book.


pdf bib
The Validity of Lexicon-based Sentiment Analysis in Interdisciplinary Research
Emily Öhman
Proceedings of the Workshop on Natural Language Processing for Digital Humanities

Lexicon-based sentiment and emotion analysis methods are widely used particularly in applied Natural Language Processing (NLP) projects in fields such as computational social science and digital humanities. These lexicon-based methods have often been criticized for their lack of validation and accuracy – sometimes fairly. However, in this paper, we argue that lexicon-based methods work well particularly when moving up in granularity and show how useful lexicon-based methods can be for projects where neither qualitative analysis nor a machine learning-based approach is possible. Indeed, we argue that the measure of a lexicon’s accuracy should be grounded in its usefulness.

Japanese Beauty Marketing on Social Media: Critical Discourse Analysis Meets NLP
Emily Öhman | Amy Gracy Metcalfe
Proceedings of the Workshop on Natural Language Processing for Digital Humanities

This project is a pilot study intending to combine traditional corpus linguistics, Natural Language Processing, critical discourse analysis, and digital humanities to gain an up-to-date understanding of how beauty is being marketed on social media, specifically Instagram, to followers. We use topic modeling combined with critical discourse analysis and NLP tools for insights into the “Japanese Beauty Myth” and show an overview of the dataset that we make publicly available.


XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection
Emily Öhman | Marc Pàmies | Kaisla Kajava | Jörg Tiedemann
Proceedings of the 28th International Conference on Computational Linguistics

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

LT@Helsinki at SemEval-2020 Task 12: Multilingual or Language-specific BERT?
Marc Pàmies | Emily Öhman | Kaisla Kajava | Jörg Tiedemann
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so-called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.


Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation
Emily Öhman | Kaisla Kajava | Jörg Tiedemann | Timo Honkela
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, Sentimentator, that can be used for efficient annotation based on crowd sourcing and a self-perpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and open-source and can easily be extended and applied for various purposes.


The Challenges of Multi-dimensional Sentiment Analysis Across Languages
Emily Öhman | Timo Honkela | Jörg Tiedemann
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservation of sentiments and emotions in translation and our assessment reveals that the lexical approach shows great inter-language agreement. However, our manual evaluation also suggests that the use of purely lexical methods is limited and further studies are necessary to pinpoint the cross-lingual differences and to develop better sentiment classifiers.