Wiebke Petersen

2024

pdf abs
Team art-nat-HHU at SemEval-2024 Task 8: Stylistically Informed Fusion Model for MGT-Detection
Vittorio Ciccarelli | Cornelia Genz | Nele Mastracchio | Wiebke Petersen | Anna Stein | Hanxin Xia
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper presents our solution for subtask A of shared task 8 of SemEval 2024 for classifying human- and machine-written texts in English across multiple domains. We propose a fusion model consisting of RoBERTa based pre-classifier and two MLPs that have been trained to correct the pre-classifier using linguistic features. Our model achieved an accuracy of 85%.

pdf bib
KlarTextCoders at StaGE: Automatic Statement Annotations for German Easy Language
Akhilesh Kakolu Ramarao | Wiebke Petersen | Anna Sophia Stein | Emma Stein | Hanxin Xia
Proceedings of GermEval 2024 Shared Task on Statement Segmentation in German Easy Language (StaGE)

2023

pdf abs
hhuEDOS at SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) Binary Sexism Detection (Subtask A)
Wiebke Petersen | Diem-Ly Tran | Marion Wroblewitz
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

In this paper, we describe SemEval-2023 Task 10, a shared task on detecting and predicting sexist language. The dataset consists of labeled sexist and non-sexist data targeted towards women acquired from both Reddit and Gab. We present and compare several approaches we experimented with and our final submitted model. Additional error analysis is given to recognize challenges we dealt with in our process. A total of 84 teams participated. Our model ranks 55th overall in Subtask A of the shared task.

2022

In this paper, we describe our submission to the ‘Text Complexity DE Challenge 2022’ shared task on predicting the complexity of German sentences. We compare performance of different feature-based regression architectures and transformer language models. Our best candidate is a fine-tuned German Distilbert model that ignores linguistic features of the sentences. Our model ranks 7th place in the shared task.

The paper presents an iterative bidirectional clustering of adjectives and nouns based on a co-occurrence matrix. The clustering method combines a Vector Space Models (VSM) and the results of a Latent Dirichlet Allocation (LDA), whose results are merged in each iterative step. The aim is to derive a clustering of German adjectives that reflects latent semantic classes of adjectives, and that can be used to induce frame-based representations of nouns in a later step. We are able to show that the method induces meaningful groups of adjectives, and that it outperforms a baseline k-means algorithm.