Steffen Zeiler


2025

pdf bib
MisinfoTeleGraph: Network-driven Misinformation Detection for German Telegram Messages
Lu Kalkbrenner | Veronika Solopova | Steffen Zeiler | Robert Nickel | Dorothea Kolossa
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)

Connectivity and message propagation are central, yet often underutilised, sources of information in misinformation detection—especially on poorly moderated platforms such as Telegram, which has become a critical channel for misinformation dissemination, namely in the German electoral context. In this paper, we introduce Misinfo-TeleGraph, the first German-language Telegram-based graph dataset for misinformation detection. It includes over 5 million messages from public channels, enriched with metadata, channel relationships, and both weak and strong labels. These labels are derived via semantic similarity to fact-checks and news articles using M3-embeddings, as well as manual annotation. To establish reproducible baselines, we evaluate both text-only models and graph neural networks (GNNs) that incorporate message forwarding as a network structure. Our results show that GraphSAGE with LSTM aggregation significantly outperforms text-only baselines in terms of Matthews Correlation Coefficient (MCC) and F1-score. We further evaluate the impact of subscribers, view counts, and automatically versus human-created labels on performance, and highlight both the potential and challenges of weak supervision in this domain. This work provides a reproducible benchmark and open dataset for future research on misinformation detection in German-language Telegram networks and other low-moderation social platforms.

2024

pdf bib
Features and Detectability of German Texts Generated with Large Language Models
Verena Irrgang | Veronika Solopova | Steffen Zeiler | Robert M. Nickel | Dorothea Kolossa
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)

2020

pdf bib
Variational Autoencoder with Embedded Student-t Mixture Model for Authorship Attribution
Benedikt Boenninghoff | Steffen Zeiler | Robert Nickel | Dorothea Kolossa
Proceedings of the 28th International Conference on Computational Linguistics

Traditional computational authorship attribution describes a classification task in a closed-set scenario. Given a finite set of candidate authors and corresponding labeled texts, the objective is to determine which of the authors has written another set of anonymous or disputed texts. In this work, we propose a probabilistic autoencoding framework to deal with this supervised classification task. Variational autoencoders (VAEs) have had tremendous success in learning latent representations. However, existing VAEs are currently still bound by limitations imposed by the assumed Gaussianity of the underlying probability distributions in the latent space. In this work, we are extending a VAE with an embedded Gaussian mixture model to a Student-t mixture model, which allows for an independent control of the “heaviness” of the respective tails of the implied probability densities. Experiments over an Amazon review dataset indicate superior performance of the proposed method.

2010

pdf bib
WAPUSK20 - A Database for Robust Audiovisual Speech Recognition
Alexander Vorwerk | Xiaohui Wang | Dorothea Kolossa | Steffen Zeiler | Reinhold Orglmeister
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Audiovisual speech recognition (AVSR) systems have been proven superior over audio-only speech recognizers in noisy environments by incorporating features of the visual modality. In order to develop reliable AVSR systems, appropriate simultaneously recorded speech and video data is needed. In this paper, we will introduce a corpus (WAPUSK20) that consists of audiovisual data of 20 speakers uttering 100 sentences each with four channels of audio and a stereoscopic video. The latter is intended to support more accurate lip tracking and the development of stereo data based normalization techniques for greater robustness of the recognition results. The sentence design has been adopted from the GRID corpus that has been widely used for AVSR experiments. Recordings have been made under acoustically realistic conditions in a usual office room. Affordable hardware equipment has been used, such as a pre-calibrated stereo camera and standard PC components. The software written to create this corpus was designed in MATLAB with help of hardware specific software provided by the hardware manufacturers and freely available open source software.