Nicolas Buzeta
2025
Hate Explained: Evaluating NER-Enriched Text in Human and Machine Moderation of Hate Speech
Andres Carvallo
|
Marcelo Mendoza
|
Miguel Fernandez
|
Maximiliano Ojeda
|
Lilly Guevara
|
Diego Varela
|
Martin Borquez
|
Nicolas Buzeta
|
Felipe Ayala
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Hate speech detection is vital for creating safe online environments, as harmful content can drive social polarization. This study explores the impact of enriching text with intent and group tags on machine performance and human moderation workflows. For machine performance, we enriched text with intent and group tags to train hate speech classifiers. Intent tags were the most effective, achieving state-of-the-art F1-score improvements on the IHC, SBIC, and DH datasets, respectively. Cross-dataset evaluations further demonstrated the superior generalization of intent-tagged models compared to other pre-trained approaches. Then, through a user study (N=100), we evaluated seven moderation settings, including intent tags, group tags, model probabilities, and randomized counterparts. Intent annotations significantly improved the accuracy of the moderators, allowing them to outperform machine classifiers by 12.9%. Moderators also rated intent tags as the most useful explanation tool, with a 41% increase in perceived helpfulness over the control group. Our findings demonstrate that intent-based annotations enhance both machine classification performance and human moderation workflows.
Search
Fix author
Co-authors
- Felipe Ayala 1
- Martin Borquez 1
- Andrés Carvallo 1
- Miguel Fernandez 1
- Lilly Guevara 1
- show all...