Cao Te

2026

Rigorous content moderation is crucial for online advertising but leads to millions of daily rejections. This scale renders manual rectification infeasible, particularly for video advertisements.However, existing safety-driven methods often suffer from aggressive over-editing, which compromises the advertiser’s original semantic intent merely to satisfy compliance.In this work, we target the rectification of textual violations in video ads, covering both speech transcripts and on-screen text. We propose ℛ³, a novel framework designed to harmonize compliance with original semantic intent preservation.Our approach integrates three key innovations: (1) an experience-driven data synthesis framework that bootstraps high-quality supervision via group-**R**elative compliance experience extractor; (2) a curriculum **R**einforcement learning strategy with hierarchical rewards designed to enforce compliance while maximizing semantic consistency;and (3) a comprehensive video **R**ectification framework seamlessly integrating text recognition, rewriting, and re-rendering for industrial deployment. Extensive experiments on industrial datasets and online A/B testing demonstrate that ℛ³ significantly outperforms state-of-the-art baselines, achieving an optimal trade-off between violation rectification and intent preservation.

pdf bib abs

While advertising is a cornerstone of commercial growth, it is constrained by online violation detection systems that reject non-compliant content at a million-scale daily. Advertisers urgently require automated solutions to rectify these advertisements, especially visual ads, as manual fixing is unscalable. Although recent safety-driven methods can achieve compliance, they typically suffer from over-editing, destroying the original commercial intent and perceptual similarity.To address this, we present SSR-A, a framework tailored for the minimalist rectification of non-compliant image ads.Instead of fine-tuning image editing models directly, SSR-A focuses on translating violation policies into targeted editing instructions.We first introduce a Spatial- and Semantic-Aware Instruction Synthesis Pipeline, where MLLMs synthesize candidate instructions—incorporating spatial grounding and semantic guidance—and select the optimal instruction via multi-dimensional evaluation. Furthermore, we align the model using Curriculum Reinforcement Learning, employing GRPO with multi-faceted rewards to progressively navigate the trade-off between compliance and visual preservation. Extensive experiments and online A/B tests show that SSR-A significantly outperforms state-of-the-art baselines in both compliance and preservation of visual and commercial consistency.

Co-authors

Mengge Xue 2

Huan Yu 2

Venues

ACL2

Fix author