Panagiotis Soustas


2024

pdf bib
The Elsagate Corpus: Characterising Commentary on Alarming Video Content
Panagiotis Soustas | Matthew Edwards
Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security

Identifying disturbing online content being targeted at children is an important content moderation problem. However, previous approaches to this problem have focused on features of the content itself, and neglected potentially helpful insights from the reactions expressed by its online audience. To help remedy this, we present the Elsagate Corpus, a collection of over 22 million comments on more than 18,000 videos that have been associated with disturbing content. We describe the how we collected this corpus and present some insights from our initial explorations, including the surprisingly positive reactions from audiences to this content, some unusual non-linguistic commenting behavior of uncertain purpose and references to some concerning themes.