MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance

Agam Goyal; Xianyang Zhan; Yilun Chen; Koustuv Saha; Eshwar Chandrasekharan

MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance

Agam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, Eshwar Chandrasekharan

Abstract

Large language models (LLMs) have shown great potential in flagging harmful content in online communities. Yet, existing approaches for moderation require a separate model for every community and are opaque in their decision-making, limiting real-world adoption. We introduce Mixture of Moderation Experts (MoMoE), a modular, cross-community framework that adds post-hoc explanations to enable scalable content moderation. MoMoE orchestrates four operators—Allocate, Predict, Aggregate, Explain—and is instantiated as seven community-specialized experts (MoMoE-Community) and five norm-violation experts (MoMoE-NormVio). On 30 unseen subreddits, the best variants obtain Micro-F1 scores of 0.72 and 0.67, respectively, matching or surpassing strong fine-tuned baselines while consistently producing concise and reliable explanations. Although community-specialized experts deliver the highest peak accuracy, norm-violation experts provide steadier performance across domains. These findings show that MoMoE yields scalable, transparent moderation without needing per-community fine-tuning. More broadly, they suggest that lightweight, explainable expert ensembles can guide future NLP and HCI research on trustworthy human-AI governance of online communities.

Anthology ID:: 2025.emnlp-main.638
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12656–12671
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.638/
DOI:
Bibkey:
Cite (ACL):: Agam Goyal, Xianyang Zhan, Yilun Chen, Koustuv Saha, and Eshwar Chandrasekharan. 2025. MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12656–12671, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance (Goyal et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.638.pdf
Checklist:: 2025.emnlp-main.638.checklist.pdf

PDF Cite Search Checklist Fix data