Charlott Jakob
2026
Take It All: Ensemble Retrieval for Multimodal Evidence Aggregation
Max Upravitelev | Veronika Solopova | Premtim Sahitaj | Ariana Sahitaj | Charlott Jakob | Sebastian Möller | Vera Schmitt
Proceedings of the Ninth Fact Extraction and VERification Workshop (FEVER)
Max Upravitelev | Veronika Solopova | Premtim Sahitaj | Ariana Sahitaj | Charlott Jakob | Sebastian Möller | Vera Schmitt
Proceedings of the Ninth Fact Extraction and VERification Workshop (FEVER)
Multimodal fact checking has become increasingly important due to the predominance of visual content on social media platforms, where images are frequently used to enhance the credibility and spread of misleading claims, while generated images become more prevalent and realistic as generative models advance. Incorporating visual information, however, substantially increases computational costs, raising critical efficiency concerns for practical deployment. In this study, we propose and evaluate the ADA-AGGR (ensemble retrievAl for multimoDAl evidence AGGRegation) pipeline, which achieved the second place on both the dev and test leaderboards of the FEVER 9/AVerImaTeC shared task. However, long runtimes per claim highlight challenges regarding efficiency concerns when designing multimodal claim verification pipelines. We therefore run extensive ablation studies and configuration analyses to identify possible performance–runtime improvements. Our experiments show that substantial efficiency gains are possible without significant loss in verification quality. For instance, we reduced the average runtime by up to 6.28× while maintaining comparable performance across evaluation metrics by aggressively downsampling input images processed by visual language models. Overall, our results highlight that careful design choices are crucial for building scalable and resource-efficient multimodal fact-checking systems suitable for real-world deployment.
News Credibility Assessment by LLMs and Humans: Implications for Political Bias
Pia Wenzel Neves | Charlott Jakob | Vera Schmitt
The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis (WASSA 2026)
Pia Wenzel Neves | Charlott Jakob | Vera Schmitt
The Proceedings for the 15th Workshop on Computational Approaches to Subjectivity, Sentiment Social Media Analysis (WASSA 2026)
In an era of rapid misinformation spread, LLMs have emerged as tools for assessing news credibility at scale. However, the assessments are influenced by social and cultural biases. Studies investigating political bias, compare model credibility ratings with expert credibility ratings. Comparing LLMs to the perceptions of political camps extends this approach to detecting similarities in their biases.We compare LLM-generated credibility and bias ratings of news outlets with expert assessments and stratified political opinions collected through surveys. We analyse three models (Llama 3.3 70B, Mixtral 8x7B, and GPT-OSS 120B) across 47 news outlets from two countries (U.S. and Germany).We found that models demonstrated consistently high alignment with expert ratings, while showing weaker and more variable alignment with public opinions. For US-American news outlets all models showed stronger alignment with center-left perceptions, while for German news outlets the alignment is more diverse.
2025
PolBiX: Detecting LLMs’ Political Bias in Fact-Checking through X-phemisms
Charlott Jakob | David Harbecke | Patrick Parschan | Pia Wenzel Neves | Vera Schmitt
Findings of the Association for Computational Linguistics: EMNLP 2025
Charlott Jakob | David Harbecke | Patrick Parschan | Pia Wenzel Neves | Vera Schmitt
Findings of the Association for Computational Linguistics: EMNLP 2025
Large Language Models are increasingly used in applications requiring objective assessment, which could be compromised by political bias. Many studies found preferences for left-leaning positions in LLMs, but downstream effects on tasks like fact-checking remain underexplored. In this study, we systematically investigate political bias through exchanging words with euphemisms or dysphemisms in German claims. We construct minimal pairs of factually equivalent claims that differ in political connotation, to assess the consistency of LLMs in classifying them as true or false. We evaluate six LLMs and find that, more than political leaning, the presence of judgmental words significantly influences truthfulness assessment. While a few models show tendencies of political bias, this is not mitigated by explicitly calling for objectivism in prompts. Warning: This paper contains content that may be offensive or upsetting.
Overview of the SustainEval 2025 Shared Task: Identifying the Topic and Verifiability of Sustainability Report Excerpts
Jakob Prange | Charlott Jakob | Patrick Göttfert | Raphael Huber | Pia Wenzel Neves | Annemarie Friedrich
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Workshops
Jakob Prange | Charlott Jakob | Patrick Göttfert | Raphael Huber | Pia Wenzel Neves | Annemarie Friedrich
Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Workshops
2024
Augmented Political Leaning Detection: Leveraging Parliamentary Speeches for Classifying News Articles
Charlott Jakob | Pia Wenzel | Salar Mohtaj | Vera Schmitt
Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers
Charlott Jakob | Pia Wenzel | Salar Mohtaj | Vera Schmitt
Proceedings of the 4th Workshop on Computational Linguistics for the Political and Social Sciences: Long and short papers
In an era where political discourse infiltrates online platforms and news media, identifying opinion is increasingly critical, especially in news articles, where objectivity is expected. Readers frequently encounter authors’ inherent political viewpoints, challenging them to discern facts from opinions. Classifying text on a spectrum from left to right is a key task for uncovering these viewpoints. Previous approaches rely on outdated datasets to classify current articles, neglecting that political opinions on certain subjects change over time. This paper explores a novel methodology for detecting political leaning in news articles by augmenting them with political speeches specific to the topic and publication time. We evaluated the impact of the augmentation using BERT and Mistral models. The results show that the BERT model’s F1 score improved from a baseline of 0.82 to 0.85, while the Mistral model’s F1 score increased from 0.30 to 0.31.