2025
pdf
bib
abs
Exploring Semantic Filtering Heuristics For Efficient Claim Verification
Max Upravitelev
|
Premtim Sahitaj
|
Arthur Hilbert
|
Veronika Solopova
|
Jing Yang
|
Nils Feldhus
|
Tatiana Anikina
|
Simon Ostermann
|
Vera Schmitt
Proceedings of the Eighth Fact Extraction and VERification Workshop (FEVER)
Given the limited computational and financial resources of news agencies, real-life usage of fact-checking systems requires fast response times. For this reason, our submission to the FEVER-8 claim verification shared task focuses on optimizing the efficiency of such pipelines built around subtasks such as evidence retrieval and veracity prediction. We propose the Semantic Filtering for Efficient Fact Checking (SFEFC) strategy, which is inspired by the FEVER-8 baseline and designed with the goal of reducing the number of LLM calls and other computationally expensive subroutines. Furthermore, we explore the reuse of cosine similarities initially calculated within a dense retrieval step to retrieve the top 10 most relevant evidence sentence sets. We use these sets for semantic filtering methods based on similarity scores and create filters for particularly hard classification labels “Not Enough Information” and “Conflicting Evidence/Cherrypicking” by identifying thresholds for potentially relevant information and the semantic variance within these sets. Compared to the parallelized FEVER-8 baseline, which takes 33.88 seconds on average to process a claim according to the FEVER-8 shared task leaderboard, our non-parallelized system remains competitive in regard to AVeriTeC retrieval scores while reducing the runtime to 7.01 seconds, achieving the fastest average runtime per claim.
pdf
bib
abs
Hybrid Annotation for Propaganda Detection: Integrating LLM Pre-Annotations with Human Intelligence
Ariana Sahitaj
|
Premtim Sahitaj
|
Veronika Solopova
|
Jiaao Li
|
Sebastian Möller
|
Vera Schmitt
Proceedings of the Fourth Workshop on NLP for Positive Impact (NLP4PI)
Propaganda detection on social media remains challenging due to task complexity and limited high-quality labeled data. This paper introduces a novel framework that combines human expertise with Large Language Model (LLM) assistance to improve both annotation consistency and scalability. We propose a hierarchical taxonomy that organizes 14 fine-grained propaganda techniques (CITATION) into three broader categories, conduct a human annotation study on the HQP dataset (CITATION) that reveals low inter-annotator agreement for fine-grained labels, and implement an LLM-assisted pre-annotation pipeline that extracts propagandistic spans, generates concise explanations, and assigns local labels as well as a global label. A secondary human verification study shows significant improvements in both agreement and time-efficiency. Building on this, we fine-tune smaller language models (SLMs) to perform structured annotation. Instead of fine-tuning on human annotations, we train on high-quality LLM-generated data, allowing a large model to produce these annotations and a smaller model to learn to generate them via knowledge distillation. Our work contributes towards the development of scalable and robust propaganda detection systems, supporting the idea of transparent and accountable media ecosystems in line with SDG 16. The code is publicly available at our GitHub repository.
pdf
bib
abs
Comparing LLMs and BERT-based Classifiers for Resource-Sensitive Claim Verification in Social Media
Max Upravitelev
|
Nicolau Duran-Silva
|
Christian Woerle
|
Giuseppe Guarino
|
Salar Mohtaj
|
Jing Yang
|
Veronika Solopova
|
Vera Schmitt
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)
The overwhelming volume of content being published at any given moment poses a significant challenge for the design of automated fact-checking (AFC) systems on social media, requiring an emphasized consideration of efficiency aspects.As in other fields, systems built upon LLMs have achieved good results on different AFC benchmarks. However, the application of LLMs is accompanied by high resource requirements. The energy consumption of LLMs poses a significant challenge from an ecological perspective, while remaining a bottleneck in latency-sensitive scenarios like AFC within social media. Therefore, we propose a system built upon fine-tuned smaller BERT-based models. When evaluated on the ClimateCheck dataset against decoder-only LLMs, our best fine-tuned model outperforms Phi 4 and approaches Qwen3 14B in reasoning mode — while significantly reducing runtime per claim. Our findings demonstrate that small encoder-only models fine-tuned for specific tasks can still provide a substantive alternative to large decoder-only LLMs, especially in efficiency-concerned settings.
pdf
bib
abs
Improving Sentiment Analysis for Ukrainian Social Media Code-Switching Data
Yurii Shynkarov
|
Veronika Solopova
|
Vera Schmitt
Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025)
This paper addresses the challenges of sentiment analysis in Ukrainian social media, where users frequently engage in code-switching with Russian and other languages. We introduce COSMUS (COde-Switched MUltilingual Sentiment for Ukrainian Social media), a 12,224-texts corpus collected from Telegram channels, product‐review sites and open datasets, annotated into positive, negative, neutral and mixed sentiment classes as well as language labels (Ukrainian, Russian, code-switched). We benchmark three modeling paradigms: (i) few‐shot prompting of GPT‐4o and DeepSeek V2-chat, (ii) multilingual mBERT, and (iii) the Ukrainian‐centric UkrRoberta. We also analyze calibration and LIME scores of the latter two solutions to verify its performance on various language labels. To mitigate data sparsity we test two augmentation strategies: back‐translation consistently hurts performance, whereas a Large Language Model (LLM) word‐substitution scheme yields up to +2.2% accuracy. Our work delivers the first publicly available dataset and comprehensive benchmark for sentiment classification in Ukrainian code‐switching media. Results demonstrate that language‐specific pre‐training combined with targeted augmentation yields the most accurate and trustworthy predictions in this challenging low‐resource setting.
pdf
bib
abs
MisinfoTeleGraph: Network-driven Misinformation Detection for German Telegram Messages
Lu Kalkbrenner
|
Veronika Solopova
|
Steffen Zeiler
|
Robert Nickel
|
Dorothea Kolossa
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Connectivity and message propagation are central, yet often underutilised, sources of information in misinformation detection—especially on poorly moderated platforms such as Telegram, which has become a critical channel for misinformation dissemination, namely in the German electoral context. In this paper, we introduce Misinfo-TeleGraph, the first German-language Telegram-based graph dataset for misinformation detection. It includes over 5 million messages from public channels, enriched with metadata, channel relationships, and both weak and strong labels. These labels are derived via semantic similarity to fact-checks and news articles using M3-embeddings, as well as manual annotation. To establish reproducible baselines, we evaluate both text-only models and graph neural networks (GNNs) that incorporate message forwarding as a network structure. Our results show that GraphSAGE with LSTM aggregation significantly outperforms text-only baselines in terms of Matthews Correlation Coefficient (MCC) and F1-score. We further evaluate the impact of subscribers, view counts, and automatically versus human-created labels on performance, and highlight both the potential and challenges of weak supervision in this domain. This work provides a reproducible benchmark and open dataset for future research on misinformation detection in German-language Telegram networks and other low-moderation social platforms.
2024
pdf
bib
abs
Check News in One Click: NLP-Empowered Pro-Kremlin Propaganda Detection
Veronika Solopova
|
Viktoriia Herman
|
Christoph Benzmüller
|
Tim Landgraf
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Many European citizens become targets of the Kremlin propaganda campaigns, aiming to minimise public support for Ukraine, foster a climate of mistrust and disunity, and shape elections (Meister, 2022). To address this challenge, we developed “Check News in 1 Click”, the first NLP-empowered pro-Kremlin propaganda detection application available in 7 languages, which provides the lay user with feedback on their news, and explains manipulative linguistic features and keywords. We conducted a user study, analysed user entries and models’ behaviour paired with questionnaire answers, and investigated the advantages and disadvantages of the proposed interpretative solution.
pdf
bib
Features and Detectability of German Texts Generated with Large Language Models
Verena Irrgang
|
Veronika Solopova
|
Steffen Zeiler
|
Robert M. Nickel
|
Dorothea Kolossa
Proceedings of the 20th Conference on Natural Language Processing (KONVENS 2024)
2023
pdf
bib
abs
The Evolution of Pro-Kremlin Propaganda From a Machine Learning and Linguistics Perspective
Veronika Solopova
|
Christoph Benzmüller
|
Tim Landgraf
Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
In the Russo-Ukrainian war, propaganda is produced by Russian state-run news outlets for both international and domestic audiences. Its content and form evolve and change with time as the war continues. This constitutes a challenge to content moderation tools based on machine learning when the data used for training and the current news start to differ significantly. In this follow-up study, we evaluate our previous BERT and SVM models that classify Pro-Kremlin propaganda from a Pro-Western stance, trained on the data from news articles and telegram posts at the start of 2022, on the new 2023 subset. We examine both classifiers’ errors and perform a comparative analysis of these subsets to investigate which changes in narratives provoke drops in performance.
2021
pdf
bib
abs
A German Corpus of Reflective Sentences
Veronika Solopova
|
Oana-Iuliana Popescu
|
Margarita Chikobava
|
Ralf Romeike
|
Tim Landgraf
|
Christoph Benzmüller
Proceedings of the 18th International Conference on Natural Language Processing (ICON)
Reflection about a learning process is beneficial to students in higher education (Bub-nys, 2019). The importance of machine understanding of reflective texts grows as applications supporting students become more widespread. Nevertheless, due to the sensitive content, there is no public corpus available yet for the classification of text reflectiveness. We provide the first open-access corpus of reflective student essays in German. We collected essays from three different disciplines (Software Development, Ethics of Artificial Intelligence, and Teacher Training). We annotated the corpus at sentence level with binary reflective/non-reflective labels, using an iterative annotation process with linguistic and didactic specialists, mapping the reflective components found in the data to existing schemes and complementing them. We propose and evaluate linguistic features of reflectiveness and analyse their distribution within the resulted sentences according to their labels. Our contribution constitutes the first open-access corpus to help the community towards a unified approach for reflection detection.
2020
pdf
bib
abs
Adapting Coreference Resolution to Twitter Conversations
Berfin Aktaş
|
Veronika Solopova
|
Annalena Kohnert
|
Manfred Stede
Findings of the Association for Computational Linguistics: EMNLP 2020
The performance of standard coreference resolution is known to drop significantly on Twitter texts. We improve the performance of the (Lee et al., 2018) system, which is originally trained on OntoNotes, by retraining on manually-annotated Twitter conversation data. Further experiments by combining different portions of OntoNotes with Twitter data show that selecting text genres for the training data can beat the mere maximization of training data amount. In addition, we inspect several phenomena such as the role of deictic pronouns in conversational data, and present additional results for variant settings. Our best configuration improves the performance of the”out of the box” system by 21.6%.