Tomas Ditchfield-Ogle


2025

This project compares methods for de- tecting violent videos, which are crucial for ensuring real-time safety in surveil- lance and digital moderation. It evaluates four approaches: a random forest classi- fier, a transformer model, and two multi- modal vision-language models. The pro- cess involves preprocessing datasets, train- ing models, and assessing accuracy, inter- pretability, scalability, and real-time suit- ability. Results show that traditional meth- ods are simple but less effective. The trans- former model achieved high accuracy, and the multimodal models offered high vio- lence recall with descriptive justifications. The study highlights trade-offs and pro- vides practical insights for the deployment of automated violence detection.