Mapping the Landscape of Unregulated eXplicit Contents on Reddit

Msvpj Sathvik, Manan Roy Choudhury, Rishita Agarwal, Sathwik Narkedimilli, Thao Ha, Liesel Sharabi, Vivek Gupta


Abstract
The rise of online platforms has facilitated covert forms of explicit content, which pose significant challenges for detection and regulation. Often using coded language to bypass moderation, this content erodes user trust and may be associated with scam-related risks, posing direct financial and personal risks. In this study, we map the landscape of online explicit content posts, focusing on their categorization, linguistic strategies, and temporal and behavioral patterns as they appear within mainstream platform reddit. We investigated five distinct content categories including Virtual Services (VS), Physical Services (PS), Exhibitionism (Ex), Couples and Group Interactions (CGI), and Content Creation and Sales (CCS) and performedmed large-scale experimentation using state-of-the-art large language models (LLMs) such as GPT-4, LLaMA 3.3-70B-Instruct, Gemini 1.5 Flash, Mistral 8×7B, Qwen 2.5 Turbo, and Claude 3.5 Haiku. Our work demonstrates that a nuanced classification of these services requires moving beyond simple keywords, and we establish that expressive signals such as sentiment, emotion, and tone are critical features for accurate detection. Our analysis reveals the distinct behavioral and psychosocial expression patterns that characterize each service category, providing a robust framework for future moderation.
Anthology ID:
2026.nlpcss-1.16
Volume:
Proceedings of the Seventh Workshop on Natural Language Processing and Computational Social Science
Month:
July
Year:
2026
Address:
San Diego
Editors:
Dallas Card, Anjalie Field, Katherine Keith, Julia Mendelsohn
Venues:
NLP+CSS | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
271–292
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.nlpcss-1.16/
DOI:
Bibkey:
Cite (ACL):
Msvpj Sathvik, Manan Roy Choudhury, Rishita Agarwal, Sathwik Narkedimilli, Thao Ha, Liesel Sharabi, and Vivek Gupta. 2026. Mapping the Landscape of Unregulated eXplicit Contents on Reddit. In Proceedings of the Seventh Workshop on Natural Language Processing and Computational Social Science, pages 271–292, San Diego. Association for Computational Linguistics.
Cite (Informal):
Mapping the Landscape of Unregulated eXplicit Contents on Reddit (Sathvik et al., NLP+CSS 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.nlpcss-1.16.pdf