Liqun Liu
2026
ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring
Deyi Ji | Junyu Lu | Xuanyi Liu | Liqun Liu | Hailong Zhang | Peng Shu | Huan Yu | Jie Jiang | Tianrun Chen | Lanyun Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Deyi Ji | Junyu Lu | Xuanyi Liu | Liqun Liu | Hailong Zhang | Peng Shu | Huan Yu | Jie Jiang | Tianrun Chen | Lanyun Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Online advertising governance faces significant challenges due to the non-stationary nature of regulatory policies, where emerging mandates (e.g., restrictions on education or aesthetic anxiety) create severe label inconsistencies and reasoning ambiguities in historical datasets. In this paper, we propose ARGUS, a policy-adaptive governance system that enables evolving reinforcement through multi-agent adversarial umpiring. ARGUS addresses the sparsity of new policy data by employing a three-stage framework: (1) Policy Seeding for initial perception; (2) Adversarial Label Rectification, which utilizes a ”Prosecutor-Defender-Umpire” architecture to resolve conflicts between stale labels and new mandates; and (3) Latent Knowledge Discovery, which employs a tripartite dialectical discussion to unearth sophisticated, “gray-area” violations. By leveraging RAG-enhanced policy knowledge and Chain-of-Thought synthesis as dynamic rewards for reinforcement learning, ARGUS synchronizes its reasoning pathways with evolving regulations. Extensive experiments on both industrial and public datasets demonstrate that ARGUS significantly outperforms traditional fine-tuning baselines, achieving superior policy-adaptive learning with minimal gold data.
Towards Faithful Industrial RAG: A Reinforced Co-adaptation Framework for Advertising QA
Wenwei Li | Ming Xu | Tianle Xia | Lingxiang Hu | Yiding Sun | Linfang Shang | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Wenwei Li | Ming Xu | Tianle Xia | Lingxiang Hu | Yiding Sun | Linfang Shang | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Industrial advertising question answering (QA) is a high-stakes task in which hallucinated content, particularly fabricated URLs, can lead to financial loss, compliance violations, and legal risk. Although Retrieval-Augmented Generation (RAG) is widely adopted, deploying it in production remains challenging because industrial knowledge is inherently relational, frequently updated, and insufficiently aligned with generation objectives. We propose a reinforced co-adaptation framework that jointly optimizes retrieval and generation through two components: (1) Graph-aware Retrieval (GraphRAG), which models entity-relation structure over a high-citation knowledge subgraph for multi-hop, domain-specific evidence selection; and (2) evidence-constrained reinforcement learning via Group Relative Policy Optimization (GRPO) with multi-dimensional rewards covering faithfulness, style compliance, safety, and URL validity. Experiments on an internal advertising QA dataset show consistent gains across expert-judged dimensions including accuracy, completeness, and safety, while reducing the hallucination rate by 72%. A two-week online A/B test demonstrates a 28.6% increase in like rate, a 46.2% decrease in dislike rate, and a 92.7% reduction in URL hallucination. The system has been running in production for over half a year and has served millions of QA interactions.
Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training
Tianle Xia | Ming Xu | Lingxiang Hu | Yiding Sun | Wenwei Li | Linfang Shang | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Tianle Xia | Ming Xu | Lingxiang Hu | Yiding Sun | Wenwei Li | Linfang Shang | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by incorporating external knowledge, yet traditional single-round retrieval struggles with complex multi-step reasoning.Agentic RAG addresses this by enabling LLMs to dynamically decide when and what to retrieve, but current RL-based training methods suffer from sparse outcome rewards that discard intermediate signals and low sample efficiency where failed samples contribute nothing.We propose Search-P1, a framework that introduces path-centric reward shaping for agentic RAG training, comprising two key components: (1) Path-Centric Reward, which evaluates the structural quality of reasoning trajectories through order-agnostic step coverage and soft scoring that extracts learning signals even from failed samples, and (2) Dual-Track Path Scoring with offline-generated reference planners that assesses paths from both self-consistency and reference-alignment perspectives.Experiments on multiple QA benchmarks demonstrate that Search-P1 achieves significant improvements over Search-R1 and other strong baselines, with an average accuracy gain of 7.7 points.
ℛ3: Advertisement Compliance ℛectification via Group-ℛelative Experience Extractor and Curriculum ℛeinforcement
Yuan Chen | Zhenyu Hu | Mengge Xue | Cao Te | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Yuan Chen | Zhenyu Hu | Mengge Xue | Cao Te | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Rigorous content moderation is crucial for online advertising but leads to millions of daily rejections. This scale renders manual rectification infeasible, particularly for video advertisements.However, existing safety-driven methods often suffer from aggressive over-editing, which compromises the advertiser’s original semantic intent merely to satisfy compliance.In this work, we target the rectification of textual violations in video ads, covering both speech transcripts and on-screen text. We propose ℛ3, a novel framework designed to harmonize compliance with original semantic intent preservation.Our approach integrates three key innovations: (1) an experience-driven data synthesis framework that bootstraps high-quality supervision via group-**R**elative compliance experience extractor; (2) a curriculum **R**einforcement learning strategy with hierarchical rewards designed to enforce compliance while maximizing semantic consistency;and (3) a comprehensive video **R**ectification framework seamlessly integrating text recognition, rewriting, and re-rendering for industrial deployment. Extensive experiments on industrial datasets and online A/B testing demonstrate that ℛ3 significantly outperforms state-of-the-art baselines, achieving an optimal trade-off between violation rectification and intent preservation.
SSR-A: Spatial- and Semantic-Aware Instructions and Curriculum Reinforcement for Advertisement Compliant Rectification
Cao Te | Mengge Xue | Zhenyu Hu | Yuan Chen | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Cao Te | Mengge Xue | Zhenyu Hu | Yuan Chen | Liqun Liu | Peng Shu | Huan Yu | Jie Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
While advertising is a cornerstone of commercial growth, it is constrained by online violation detection systems that reject non-compliant content at a million-scale daily. Advertisers urgently require automated solutions to rectify these advertisements, especially visual ads, as manual fixing is unscalable. Although recent safety-driven methods can achieve compliance, they typically suffer from over-editing, destroying the original commercial intent and perceptual similarity.To address this, we present SSR-A, a framework tailored for the minimalist rectification of non-compliant image ads.Instead of fine-tuning image editing models directly, SSR-A focuses on translating violation policies into targeted editing instructions.We first introduce a Spatial- and Semantic-Aware Instruction Synthesis Pipeline, where MLLMs synthesize candidate instructions—incorporating spatial grounding and semantic guidance—and select the optimal instruction via multi-dimensional evaluation. Furthermore, we align the model using Curriculum Reinforcement Learning, employing GRPO with multi-faceted rewards to progressively navigate the trade-off between compliance and visual preservation. Extensive experiments and online A/B tests show that SSR-A significantly outperforms state-of-the-art baselines in both compliance and preservation of visual and commercial consistency.
2025
RAVEN++: Pinpointing Fine-Grained Violations in Advertisement Videos with Active Reinforcement Reasoning
Deyi Ji | Yuekui Yang | Liqun Liu | Peng Shu | Haiyang Wu | Shaogang Tang | Xudong Chen | Shaoping Ma | Tianrun Chen | Lanyun Zhu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Deyi Ji | Yuekui Yang | Liqun Liu | Peng Shu | Haiyang Wu | Shaogang Tang | Xudong Chen | Shaoping Ma | Tianrun Chen | Lanyun Zhu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Advertising (Ad) is a cornerstone of the digital economy, yet the moderation of video advertisements remains a significant challenge due to their complexity and the need for precise violation localization. While recent advancements, such as the RAVEN model, have improved coarse-grained violation detection, critical gaps persist in fine-grained understanding, explainability, and generalization. To address these limitations, we propose RAVEN++, a novel framework that introduces three key innovations: 1) Active Reinforcement Learning (RL), which dynamically adapts training to samples of varying difficulty; 2) Fine-Grained Violation Understanding, achieved through hierarchical reward functions and reasoning distillation; and 3) Progressive Multi-Stage Training, which systematically combines knowledge injection, curriculum-based passive RL, and active RL. Extensive experiments on both public and proprietary datasets, on both offline scenarios and online deployed A/B Testing, demonstrate that RAVEN++ outperforms general-purpose LLMs and specialized models like RAVEN in terms of fine-grained violation understanding, reasoning capabilities, and generalization ability.
2024
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice Selectors
Mengge Xue | Zhenyu Hu | Liqun Liu | Kuo Liao | Shuang Li | Honglin Han | Meng Zhao | Chengguo Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mengge Xue | Zhenyu Hu | Liqun Liu | Kuo Liao | Shuang Li | Honglin Han | Meng Zhao | Chengguo Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multiple-Choice Questions (MCQs) constitute a critical area of research in the study of Large Language Models (LLMs). Previous works have investigated the selection bias problem in MCQs within few-shot scenarios, in which the LLM’s performance may be influenced by the presentation of answer choices, leaving the selection bias during Supervised Fine-Tuning (SFT) unexplored. In this paper, we reveal that selection bias persists in the SFT phase , primarily due to the LLM’s inadequate Multiple Choice Symbol Binding (MCSB) ability. This limitation implies that the model struggles to associate the answer options with their corresponding symbols (e.g., A/B/C/D) effectively. To enhance the model’s MCSB capability, we first incorporate option contents into the loss function and subsequently adjust the weights of the option symbols and contents, guiding the model to understand the option content of the current symbol. Based on this, we introduce an efficient SFT algorithm for MCQs, termed Point-wise Intelligent Feedback (PIF). PIF constructs negative instances by randomly combin- ing the incorrect option contents with all candidate symbols, and proposes a point-wise loss to provide feedback on these negative samples into LLMs. Our experimental results demonstrate that PIF significantly reduces the model’s selection bias by improving its MCSB capability. Remarkably, PIF exhibits a substantial enhancement in the accuracy for MCQs.
Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding
Kuo Liao | Shuang Li | Meng Zhao | Liqun Liu | Mengge Xue | Zhenyu Hu | Honglin Han | Chengguo Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Kuo Liao | Shuang Li | Meng Zhao | Liqun Liu | Mengge Xue | Zhenyu Hu | Honglin Han | Chengguo Yin
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent strides in large language models (LLMs) have yielded remarkable performance, leveraging reinforcement learning from human feedback (RLHF) to significantly enhance generation and alignment capabilities. However, RLHF encounters numerous challenges, including the objective mismatch issue, leading to suboptimal performance in Natural Language Understanding (NLU) tasks.To address this limitation, we propose a novel Reinforcement Learning framework enhanced with Label-sensitive Reward (RLLR) to amplify the performance of LLMs in NLU tasks. By incorporating label-sensitive pairs into reinforcement learning, our method aims to adeptly capture nuanced label-sensitive semantic features during RL, thereby enhancing natural language understanding.Experiments conducted on five diverse foundation models across eight tasks showcase promising results. In comparison to Supervised Fine-tuning models (SFT), RLLR demonstrates an average performance improvement of 1.54%. Compared with RLHF models, the improvement averages at 0.69%. These results reveal the effectiveness of our method for LLMs in NLU tasks.
2023
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao | Yudong Li | Cheng Hou | Jing Zhao | Rong Tian | Weijie Liu | Yiren Chen | Ningyuan Sun | Haoyan Liu | Weiquan Mao | Han Guo | Weigang Gou | Taiqiang Wu | Tao Zhu | Wenhang Shi | Chen Chen | Shan Huang | Sihong Chen | Liqun Liu | Feifei Li | Xiaoshuai Chen | Xingwu Sun | Zhanhui Kang | Xiaoyong Du | Linlin Shen | Kimmo Yan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Zhe Zhao | Yudong Li | Cheng Hou | Jing Zhao | Rong Tian | Weijie Liu | Yiren Chen | Ningyuan Sun | Haoyan Liu | Weiquan Mao | Han Guo | Weigang Gou | Taiqiang Wu | Tao Zhu | Wenhang Shi | Chen Chen | Shan Huang | Sihong Chen | Liqun Liu | Feifei Li | Xiaoshuai Chen | Xingwu Sun | Zhanhui Kang | Xiaoyong Du | Linlin Shen | Kimmo Yan
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.
2019
NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit
Liqun Liu | Funan Mu | Pengyu Li | Xin Mu | Jing Tang | Xingsheng Ai | Ran Fu | Lifeng Wang | Xing Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
Liqun Liu | Funan Mu | Pengyu Li | Xin Mu | Jing Tang | Xingsheng Ai | Ran Fu | Lifeng Wang | Xing Zhou
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
In this paper, we introduce NeuralClassifier, a toolkit for neural hierarchical multi-label text classification. NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. A salient feature is that NeuralClassifier currently provides a variety of text encoders, such as FastText, TextCNN, TextRNN, RCNN, VDCNN, DPCNN, DRNN, AttentiveConvNet and Transformer encoder, etc. It also supports other text classification scenarios, including binary-class and multi-class classification. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. Experiments show that models built in our toolkit achieve comparable performance with reported results in the literature.
Search
Fix author
Co-authors
- Peng Shu 6
- Jie Jiang 5
- Huan Yu 5
- Zhenyu Hu 4
- Mengge Xue 4
- Tianrun Chen 2
- Yuan Chen 2
- Honglin Han 2
- Lingxiang Hu 2
- Deyi Ji 2
- Shuang Li 2
- Wenwei Li 2
- Kuo Liao 2
- Linfang Shang 2
- Yiding Sun 2
- Cao Te 2
- Tianle Xia 2
- Ming Xu 2
- Chengguo Yin 2
- Meng Zhao 2
- Lanyun Zhu 2
- Xingsheng Ai 1
- Chen Chen 1
- Sihong Chen 1
- Xiaoshuai Chen 1
- Xudong Chen 1
- Yiren Chen 1
- Xiaoyong Du 1
- Ran Fu 1
- Weigang Gou 1
- Han Guo 1
- Cheng Hou 1
- Shan Huang 1
- Zhanhui Kang 1
- Feifei Li 1
- Pengyu Li 1
- Yudong Li 1
- Haoyan Liu 1
- Weijie Liu 1
- Xuanyi Liu 1
- Junyu Lu 1
- Shaoping Ma 1
- Weiquan Mao 1
- Funan Mu 1
- Xin Mu 1
- Linlin Shen 1
- Wenhang Shi 1
- Ningyuan Sun 1
- Xingwu Sun 1
- Jing Tang 1
- Shaogang Tang 1
- Rong Tian 1
- Lifeng Wang 1
- Haiyang Wu 1
- Taiqiang Wu 1
- Kimmo Yan 1
- Yuekui Yang 1
- Hailong Zhang 1
- Jing Zhao 1
- Zhe Zhao 1
- Xing Zhou 1
- Tao Zhu 1