Jeremy Blackburn
2025
Evaluating Large Language Models for Detecting Antisemitism
Jay Patel
|
Hrudayangam Mehta
|
Jeremy Blackburn
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Detecting hateful content is a challenging and important problem. Automated tools, like machine‐learning models, can help, but they require continuous training to adapt to the ever-changing landscape of social media. In this work, we evaluate eight open-source LLMs’ capability to detect antisemitic content, specifically leveraging in-context definition as a policy guideline. We explore various prompting techniques and design a new CoT-like prompt, Guided-CoT. Guided‐CoT handles the in-context policy well, increasing performance across all evaluated models, regardless of decoding configuration, model sizes, or reasoning capability. Notably, Llama 3.1 70B outperforms fine-tuned GPT-3.5. Additionally, we examine LLM errors and introduce metrics to quantify semantic divergence in model-generated rationales, revealing notable differences and paradoxical behaviors among LLMs. Our experiments highlight the differences observed across LLMs’ utility, explainability, and reliability.
Podcast Outcasts: Understanding Rumble’s Podcast Dynamics
Utkucan Balci
|
Jay Patel
|
Berkan Balci
|
Jeremy Blackburn
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
The rising popularity of podcasts as an emerging medium opens new avenues for digital humanities research, particularly when examining video-based media on alternative platforms. We present a novel data analysis pipeline for analyzing over 13K podcast videos (526 days of video content) from Rumble and YouTube that integrates advanced speech-to-text transcription, transformer-based topic modeling, and contrastive visual learning. We uncover the interplay between spoken rhetoric and visual elements in shaping political bias. Our findings reveal a distinct right-wing orientation in Rumble’s podcasts, contrasting with YouTube’s more diverse and apolitical content. By merging computational techniques with comparative analysis, our study advances digital humanities by demonstrating how large-scale multimodal analysis can decode ideological narratives in emerging media format.
2017
Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words
Joan Serrà
|
Ilias Leontiadis
|
Dimitris Spathis
|
Gianluca Stringhini
|
Jeremy Blackburn
|
Athena Vakali
Proceedings of the First Workshop on Abusive Language Online
Common approaches to text categorization essentially rely either on n-gram counts or on word embeddings. This presents important difficulties in highly dynamic or quickly-interacting environments, where the appearance of new words and/or varied misspellings is the norm. A paradigmatic example of this situation is abusive online behavior, with social networks and media platforms struggling to effectively combat uncommon or non-blacklisted hate words. To better deal with these issues in those fast-paced environments, we propose using the error signal of class-based language models as input to text classification algorithms. In particular, we train a next-character prediction model for any given class and then exploit the error of such class-based models to inform a neural network classifier. This way, we shift from the ‘ability to describe’ seen documents to the ‘ability to predict’ unseen content. Preliminary studies using out-of-vocabulary splits from abusive tweet data show promising results, outperforming competitive text categorization strategies by 4-11%.
Search
Fix author
Co-authors
- Jay Patel 2
- Utkucan Balci 1
- Berkan Balci 1
- Ilias Leontiadis 1
- Hrudayangam Mehta 1
- show all...