Bharadwaj Kadiyala


2026

Recent work in natural language processing, human-computer interaction, and computational social science takes interest in the study of decentralized content moderation, in which individual communities largely determine their own norms, rules, and enforcement thereof. A key challenge to this body of work is that, once moderated, content and related variables become difficult or impossible to recover; previous work often relied on 3rd-party historical data sources, but recent world events, legal disputes, and policy shifts have significantly disrupted these services, practically disabling their research use-cases. As a result, in order to conduct new research and reproduce previous results, researchers must record content as it’s created, and monitor variables of interest over time. In this paper we present and publicly release a software system for the dynamic monitoring of Reddit posts, communities, and moderation actions, to enable scalable and reproducible research on decentralized platform governance and content moderation. To the authors’ knowledge, at the time of publication this system is the only available solution for general-purpose, real-time, policy-compliant longitudinal data collection on Reddit. Furthermore, the system’s integration with the official Reddit API enables the collection of authentication-gated data such as community engagement metrics and moderation team information, which was unavailable in previous historical data sources.