Jeremy Blackburn
2025
Podcast Outcasts: Understanding Rumble’s Podcast Dynamics
Utkucan Balci
|
Jay Patel
|
Berkan Balci
|
Jeremy Blackburn
Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
The rising popularity of podcasts as an emerging medium opens new avenues for digital humanities research, particularly when examining video-based media on alternative platforms. We present a novel data analysis pipeline for analyzing over 13K podcast videos (526 days of video content) from Rumble and YouTube that integrates advanced speech-to-text transcription, transformer-based topic modeling, and contrastive visual learning. We uncover the interplay between spoken rhetoric and visual elements in shaping political bias. Our findings reveal a distinct right-wing orientation in Rumble’s podcasts, contrasting with YouTube’s more diverse and apolitical content. By merging computational techniques with comparative analysis, our study advances digital humanities by demonstrating how large-scale multimodal analysis can decode ideological narratives in emerging media format.
2017
Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words
Joan Serrà
|
Ilias Leontiadis
|
Dimitris Spathis
|
Gianluca Stringhini
|
Jeremy Blackburn
|
Athena Vakali
Proceedings of the First Workshop on Abusive Language Online
Common approaches to text categorization essentially rely either on n-gram counts or on word embeddings. This presents important difficulties in highly dynamic or quickly-interacting environments, where the appearance of new words and/or varied misspellings is the norm. A paradigmatic example of this situation is abusive online behavior, with social networks and media platforms struggling to effectively combat uncommon or non-blacklisted hate words. To better deal with these issues in those fast-paced environments, we propose using the error signal of class-based language models as input to text classification algorithms. In particular, we train a next-character prediction model for any given class and then exploit the error of such class-based models to inform a neural network classifier. This way, we shift from the ‘ability to describe’ seen documents to the ‘ability to predict’ unseen content. Preliminary studies using out-of-vocabulary splits from abusive tweet data show promising results, outperforming competitive text categorization strategies by 4-11%.
Search
Fix data
Co-authors
- Utkucan Balci 1
- Berkan Balci 1
- Ilias Leontiadis 1
- Jay Patel 1
- Joan Serrà 1
- show all...