Tomáš Horych


2025

pdf bib
The Promises and Pitfalls of LLM Annotations in Dataset Labeling: a Case Study on Media Bias Detection
Tomáš Horych | Christoph Mandl | Terry Ruas | Andre Greiner-Petter | Bela Gipp | Akiko Aizawa | Timo Spinde
Findings of the Association for Computational Linguistics: NAACL 2025

High annotation costs from hiring or crowdsourcing complicate the creation of large, high-quality datasets needed for training reliable text classifiers. Recent research suggests using Large Language Models (LLMs) to automate the annotation process, reducing these costs while maintaining data quality. LLMs have shown promising results in annotating downstream tasks like hate speech detection and political framing. Building on the success in these areas, this study investigates whether LLMs are viable for annotating a complex task of media bias detection and whether a downstream media bias classifier can be trained on such data. We create Annolexical, the first large-scale dataset for media bias classification with over 48k synthetically annotated examples. Our classifier fine-tuned on it surpasses all of the annotator LLMs by 5-9% in Mathew’s Correlation Coefficient (MCC) and performs close to or outperforms the model trained on human-labeled data when evaluated on two media bias benchmark datasets (BABE and BASIL). This study demonstrates how our approach significantly reduces the cost of dataset creation in the media bias domain and, by extension - the development of the classifiers, while our subsequent behavioral stress-testing reveals some of its current limitations and trade-offs.

2024

pdf bib
MAGPIE: Multi-Task Analysis of Media-Bias Generalization with Pre-Trained Identification of Expressions
Tomáš Horych | Martin Paul Wessel | Jan Philip Wahle | Terry Ruas | Jerome Waßmuth | André Greiner-Petter | Akiko Aizawa | Bela Gipp | Timo Spinde
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Media bias detection poses a complex, multifaceted problem traditionally tackled using single-task models and small in-domain datasets, consequently lacking generalizability. To address this, we introduce MAGPIE, a large-scale multi-task pre-training approach explicitly tailored for media bias detection. To enable large-scale pre-training, we construct Large Bias Mixture (LBM), a compilation of 59 bias-related tasks. MAGPIE outperforms previous approaches in media bias detection on the Bias Annotation By Experts (BABE) dataset, with a relative improvement of 3.3% F1-score. Furthermore, using a RoBERTa encoder, we show that MAGPIE needs only 15% of fine-tuning steps compared to single-task approaches. We provide insight into task learning interference and show that sentiment analysis and emotion detection help learning of all other tasks, and scaling the number of tasks leads to the best results. MAGPIE confirms that MTL is a promising approach for addressing media bias detection, enhancing the accuracy and efficiency of existing models. Furthermore, LBM is the first available resource collection focused on media bias MTL.

pdf bib
Beyond the Surface: Spurious Cues in Automatic Media Bias Detection
Martin Wessel | Tomáš Horych
Proceedings of the Fourth Workshop on Language Technology for Equality, Diversity, Inclusion

This study investigates the robustness and generalization of transformer-based models for automatic media bias detection. We explore the behavior of current bias classifiers by analyzing feature attributions and stress-testing with adversarial datasets. The findings reveal a disproportionate focus on rare but strongly connotated words, suggesting a rather superficial understanding of linguistic bias and challenges in contextual interpretation. This problem is further highlighted by inconsistent bias assessment when stress-tested with different entities and minorities. Enhancing automatic media bias detection models is critical to improving inclusivity in media, ensuring balanced and fair representation of diverse perspectives.