Gerhard Hagerer


2021

pdf
End-to-End Annotator Bias Approximation on Crowdsourced Single-Label Sentiment Analysis
Gerhard Hagerer | David Szabo | Andreas Koch | Maria Luisa Ripoll Dominguez | Christian Widmer | Maximilian Wich | Hannah Danner | Georg Groh
Proceedings of the Fourth International Conference on Natural Language and Speech Processing (ICNLSP 2021)

pdf
SocialVisTUM: An Interactive Visualization Toolkit for Correlated Neural Topic Models on Social Media Opinion Mining
Gerhard Hagerer | Martin Kirchhoff | Hannah Danner | Robert Pesch | Mainak Ghosh | Archishman Roy | Jiaxi Zhao | Georg Groh
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Recent research in opinion mining proposed word embedding-based topic modeling methods that provide superior coherence compared to traditional topic modeling. In this paper, we demonstrate how these methods can be used to display correlated topic models on social media texts using SocialVisTUM, our proposed interactive visualization toolkit. It displays a graph with topics as nodes and their correlations as edges. Further details are displayed interactively to support the exploration of large text collections, e.g., representative words and sentences of topics, topic and sentiment distributions, hierarchical topic clustering, and customizable, predefined topic labels. The toolkit optimizes automatically on custom data for optimal coherence. We show a working instance of the toolkit on data crawled from English social media discussions about organic food consumption. The visualization confirms findings of a qualitative consumer research study. SocialVisTUM and its training procedures are accessible online.

pdf
Investigating Annotator Bias in Abusive Language Datasets
Maximilian Wich | Christian Widmer | Gerhard Hagerer | Georg Groh
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Nowadays, social media platforms use classification models to cope with hate speech and abusive language. The problem of these models is their vulnerability to bias. A prevalent form of bias in hate speech and abusive language datasets is annotator bias caused by the annotator’s subjective perception and the complexity of the annotation task. In our paper, we develop a set of methods to measure annotator bias in abusive language datasets and to identify different perspectives on abusive language. We apply these methods to four different abusive language datasets. Our proposed approach supports annotation processes of such datasets and future research addressing different perspectives on the perception of abusive language.

2020

pdf
An Evaluation of Progressive Neural Networksfor Transfer Learning in Natural Language Processing
Abdul Moeed | Gerhard Hagerer | Sumit Dugar | Sarthak Gupta | Mainak Ghosh | Hannah Danner | Oliver Mitevski | Andreas Nawroth | Georg Groh
Proceedings of the Twelfth Language Resources and Evaluation Conference

A major challenge in modern neural networks is the utilization of previous knowledge for new tasks in an effective manner, otherwise known as transfer learning. Fine-tuning, the most widely used method for achieving this, suffers from catastrophic forgetting. The problem is often exacerbated in natural language processing (NLP). In this work, we assess progressive neural networks (PNNs) as an alternative to fine-tuning. The evaluation is based on common NLP tasks such as sequence labeling and text classification. By gauging PNNs across a range of architectures, datasets, and tasks, we observe improvements over the baselines throughout all experiments.

pdf
Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings
Abdul Moeed | Yang An | Gerhard Hagerer | Georg Groh
Proceedings of the Twelfth Language Resources and Evaluation Conference

With the explosive growth in textual data, it is becoming increasingly important to summarize text automatically. Recently, generative language models have shown promise in abstractive text summarization tasks. Since these models rephrase text and thus use similar but different words as found in the summarized text, existing metrics such as ROUGE that use n-gram overlap may not be optimal. Therefore we evaluate two embedding-based evaluation metrics that are applicable to abstractive summarization: Fr ́echet embedding distance, which has been introduced recently, and angular embedding similarity, which is our proposed metric. To demonstrate the utility of both metrics, we analyze the headline generation capacity of two state-of-the-art language models: GPT-2 and ULMFiT. In particular, our proposed metric shows close relation with human judgments in our experiments and has overall better correlations with them. To provide reproducibility, the source code plus human assessments of our experiments is available on GitHub.