Sebastian Reimann

2024

pdf abs
When is a Metaphor Actually Novel? Annotating Metaphor Novelty in the Context of Automatic Metaphor Detection
Sebastian Reimann | Tatjana Scheffler
Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)

We present an in-depth analysis of metaphor novelty, a relatively overlooked phenomenon in NLP. Novel metaphors have been analyzed via scores derived from crowdsourcing in NLP, while in theoretical work they are often defined by comparison to senses in dictionary entries. We reannotate metaphorically used words in the large VU Amsterdam Metaphor Corpus based on whether their metaphoric meaning is present in the dictionary. Based on this, we find that perceived metaphor novelty often clash with the dictionary based definition. We use the new labels to evaluate the performance of state-of-the-art language models for automatic metaphor detection and notice that novel metaphors according to our dictionary-based definition are easier to identify than novel metaphors according to crowd-sourced novelty scores. In a subsequent analysis, we study the correlation between high novelty scores and word frequencies in the pretraining and finetuning corpora, as well as potential problems with rare words for pre-trained language models. In line with previous works, we find a negative correlation between word frequency in the training data and novelty scores and we link these aspects to problems with the tokenization of BERT and RoBERTa.

2022

pdf abs
Cause and Effect in Governmental Reports: Two Data Sets for Causality Detection in Swedish
Luise Dürlich | Sebastian Reimann | Gustav Finnveden | Joakim Nivre | Sara Stymne
Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences

Causality detection is the task of extracting information about causal relations from text. It is an important task for different types of document analysis, including political impact assessment. We present two new data sets for causality detection in Swedish. The first data set is annotated with binary relevance judgments, indicating whether a sentence contains causality information or not. In the second data set, sentence pairs are ranked for relevance with respect to a causality query, containing a specific hypothesized cause and/or effect. Both data sets are carefully curated and mainly intended for use as test data. We describe the data sets and their annotation, including detailed annotation guidelines. In addition, we present pilot experiments on cross-lingual zero-shot and few-shot causality detection, using training data from English and German.

2021

pdf
Examining the Effects of Preprocessing on the Detection of Offensive Language in German Tweets
Sebastian Reimann | Daniel Dakota
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

Co-authors

Tatjana Scheffler 1