Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation

Dimosthenis Antypas, Indira Sen, Carla Perez Almendros, Jose Camacho-Collados, Francesco Barbieri


Abstract
The detection of sensitive content in large datasets is crucial for ensuring that shared and analysed data is free from harmful material. However, current moderation tools, such as external APIs, suffer from limitations in customisation, accuracy across diverse sensitive categories, and privacy concerns. Additionally, existing datasets and open-source models focus predominantly on toxic language, leaving gaps in detecting other sensitive categories such as substance abuse or self-harm. In this paper, we put forward a unified dataset tailored for social media content moderation across six sensitive categories: conflictual language, profanity,sexually explicit material, drug-related content, self-harm, and spam. By collecting and annotating data with consistent retrieval strategies and guidelines, we address the shortcomings of previous focalised research. Our analysis demonstrates that fine-tuning large language models (LLMs) on this novel dataset yields significant improvements in detection performance compared to open off-the-shelf models such as LLaMA, and even proprietary OpenAI models, which underperform by 10-15% overall. This limitation is even more pronounced on popular moderation APIs, which cannot be easily tailored to specific sensitive content categories, among others.
Anthology ID:
2025.woah-1.2
Volume:
Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Month:
August
Year:
2025
Address:
Vienna, Austria
Editors:
Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
Venues:
WOAH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17–31
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.woah-1.2/
DOI:
Bibkey:
Cite (ACL):
Dimosthenis Antypas, Indira Sen, Carla Perez Almendros, Jose Camacho-Collados, and Francesco Barbieri. 2025. Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 17–31, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation (Antypas et al., WOAH 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.woah-1.2.pdf