Mengge Xue
2026
ℛ3: Advertisement Compliance ℛectification via Group-ℛelative Experience Extractor and Curriculum ℛeinforcement
Yuan Chen | Zhenyu Hu | Mengge Xue | Cao Te | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Yuan Chen | Zhenyu Hu | Mengge Xue | Cao Te | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Rigorous content moderation is crucial for online advertising but leads to millions of daily rejections. This scale renders manual rectification infeasible, particularly for video advertisements.However, existing safety-driven methods often suffer from aggressive over-editing, which compromises the advertiser’s original semantic intent merely to satisfy compliance.In this work, we target the rectification of textual violations in video ads, covering both speech transcripts and on-screen text. We propose ℛ3, a novel framework designed to harmonize compliance with original semantic intent preservation.Our approach integrates three key innovations: (1) an experience-driven data synthesis framework that bootstraps high-quality supervision via group-**R**elative compliance experience extractor; (2) a curriculum **R**einforcement learning strategy with hierarchical rewards designed to enforce compliance while maximizing semantic consistency;and (3) a comprehensive video **R**ectification framework seamlessly integrating text recognition, rewriting, and re-rendering for industrial deployment. Extensive experiments on industrial datasets and online A/B testing demonstrate that ℛ3 significantly outperforms state-of-the-art baselines, achieving an optimal trade-off between violation rectification and intent preservation.
SSR-A: Spatial- and Semantic-Aware Instructions and Curriculum Reinforcement for Advertisement Compliant Rectification
Cao Te | Mengge Xue | Zhenyu Hu | Yuan Chen | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Cao Te | Mengge Xue | Zhenyu Hu | Yuan Chen | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
While advertising is a cornerstone of commercial growth, it is constrained by online violation detection systems that reject non-compliant content at a million-scale daily. Advertisers urgently require automated solutions to rectify these advertisements, especially visual ads, as manual fixing is unscalable. Although recent safety-driven methods can achieve compliance, they typically suffer from over-editing, destroying the original commercial intent and perceptual similarity.To address this, we present SSR-A, a framework tailored for the minimalist rectification of non-compliant image ads.Instead of fine-tuning image editing models directly, SSR-A focuses on translating violation policies into targeted editing instructions.We first introduce a Spatial- and Semantic-Aware Instruction Synthesis Pipeline, where MLLMs synthesize candidate instructions—incorporating spatial grounding and semantic guidance—and select the optimal instruction via multi-dimensional evaluation. Furthermore, we align the model using Curriculum Reinforcement Learning, employing GRPO with multi-faceted rewards to progressively navigate the trade-off between compliance and visual preservation. Extensive experiments and online A/B tests show that SSR-A significantly outperforms state-of-the-art baselines in both compliance and preservation of visual and commercial consistency.
2024
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
Mengge Xue | Zhenyu Hu | Liqun Liu | Kuo Liao | Shuang Li | Honglin Han | Meng Zhao | Chengguo Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mengge Xue | Zhenyu Hu | Liqun Liu | Kuo Liao | Shuang Li | Honglin Han | Meng Zhao | Chengguo Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs). Previous works have investigated the selection bias problem in MCQs within few-shot scenarios, in which the LLM’s performance may be influenced by the presentation of answer choices, leaving the selection bias during Supervised Fine-Tuning (SFT) unexplored. In this paper, we reveal that selection bias persists in the SFT phase , primarily due to the LLM’s inadequate Multiple Choice Symbol Binding (MCSB) ability. This limitation implies that the model struggles to associate the answer options with their corresponding symbols (e.g., A/B/C/D) effectively. To enhance the model’s MCSB capability, we first incorporate option contents into the loss function and subsequently adjust the weights of the option symbols and contents, guiding the model to understand the option content of the current symbol. Based on this, we introduce an efficient SFT algorithm for MCQs, termed Point-wise Intelligent Feedback (PIF). PIF constructs negative instances by randomly combin- ing the incorrect option contents with all candidate symbols, and proposes a point-wise loss to provide feedback on these negative samples into LLMs. Our experimental results demonstrate that PIF significantly reduces the model’s selection bias by improving its MCSB capability. Remarkably, PIF exhibits a substantial enhancement in the accuracy for MCQs.
Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding
Kuo Liao | Shuang Li | Meng Zhao | Liqun Liu | Mengge Xue | Zhenyu Hu | Honglin Han | Chengguo Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Kuo Liao | Shuang Li | Meng Zhao | Liqun Liu | Mengge Xue | Zhenyu Hu | Honglin Han | Chengguo Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent strides in large language models (LLMs) have yielded remarkable performance, leveraging reinforcement learning from human feedback (RLHF) to significantly enhance generation and alignment capabilities. However, RLHF encounters numerous challenges, including the objective mismatch issue, leading to suboptimal performance in Natural Language Understanding (NLU) tasks.To address this limitation, we propose a novel Reinforcement Learning framework enhanced with Label-sensitive Reward (RLLR) to amplify the performance of LLMs in NLU tasks. By incorporating label-sensitive pairs into reinforcement learning, our method aims to adeptly capture nuanced label-sensitive semantic features during RL, thereby enhancing natural language understanding.Experiments conducted on five diverse foundation models across eight tasks showcase promising results. In comparison to Supervised Fine-tuning models (SFT), RLLR demonstrates an average performance improvement of 1.54%. Compared with RLHF models, the improvement averages at 0.69%. These results reveal the effectiveness of our method for LLMs in NLU tasks.