Thao Ha

2026

The rise of online platforms has facilitated covert forms of explicit content, which pose significant challenges for detection and regulation. Often using coded language to bypass moderation, this content erodes user trust and may be associated with scam-related risks, posing direct financial and personal risks. In this study, we map the landscape of online explicit content posts, focusing on their categorization, linguistic strategies, and temporal and behavioral patterns as they appear within mainstream platform reddit. We investigated five distinct content categories including Virtual Services (VS), Physical Services (PS), Exhibitionism (Ex), Couples and Group Interactions (CGI), and Content Creation and Sales (CCS) and performedmed large-scale experimentation using state-of-the-art large language models (LLMs) such as GPT-4, LLaMA 3.3-70B-Instruct, Gemini 1.5 Flash, Mistral 8×7B, Qwen 2.5 Turbo, and Claude 3.5 Haiku. Our work demonstrates that a nuanced classification of these services requires moving beyond simple keywords, and we establish that expressive signals such as sentiment, emotion, and tone are critical features for accurate detection. Our analysis reveals the distinct behavioral and psychosocial expression patterns that characterize each service category, providing a robust framework for future moderation.

Co-authors

Liesel Sharabi 1

Venues

NLP+CSS1
WS1

Fix author