Haiyang Wu

2025

pdf bib abs
RAVEN: Robust Advertisement Video Violation Temporal Grounding via Reinforcement Reasoning
Deyi Ji | Yuekui Yang | Haiyang Wu | Shaoping Ma | Tianrun Chen | Lanyun Zhu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

Advertisement (Ad) video violation detection is critical for ensuring platform compliance, but existing methods struggle with precise temporal grounding, noisy annotations, and limited generalization. We propose RAVEN, a novel framework that integrates curriculum reinforcement learning with multimodal large language models (MLLMs) to enhance reasoning and cognitive capabilities for violation detection. RAVEN employs a progressive training strategy, combining precisely and coarsely annotated data, and leverages Group Relative Policy Optimization (GRPO) to develop emergent reasoning abilities without explicit reasoning annotations. Multiple hierarchical sophisticated reward mechanism ensures precise temporal grounding and consistent category prediction. Experiments on industrial datasets and public benchmarks show that RAVEN achieves superior performances in violation category accuracy and temporal interval localization. We also design a pipeline to deploy the RAVEN on the online Ad services, and online A/B testing further validates its practical applicability, with significant improvements in precision and recall. RAVEN also demonstrates strong generalization, mitigating the catastrophic forgetting issue associated with supervised fine-tuning.

Advertising (Ad) is a cornerstone of the digital economy, yet the moderation of video advertisements remains a significant challenge due to their complexity and the need for precise violation localization. While recent advancements, such as the RAVEN model, have improved coarse-grained violation detection, critical gaps persist in fine-grained understanding, explainability, and generalization. To address these limitations, we propose RAVEN++, a novel framework that introduces three key innovations: 1) Active Reinforcement Learning (RL), which dynamically adapts training to samples of varying difficulty; 2) Fine-Grained Violation Understanding, achieved through hierarchical reward functions and reasoning distillation; and 3) Progressive Multi-Stage Training, which systematically combines knowledge injection, curriculum-based passive RL, and active RL. Extensive experiments on both public and proprietary datasets, on both offline scenarios and online deployed A/B Testing, demonstrate that RAVEN++ outperforms general-purpose LLMs and specialized models like RAVEN in terms of fine-grained violation understanding, reasoning capabilities, and generalization ability.

pdf bib abs
LOHRec: Leveraging Order and Hierarchy in Generative Sequential Recommendation
Jiawen Xie | Haiyang Wu | Deyi Ji | Yuekui Yang | Shaoping Ma
Findings of the Association for Computational Linguistics: EMNLP 2025

The sequential recommendation task involves predicting the items users will be interested in next based on their past interaction sequence. Recently, sequential recommender systems with generative retrieval have garnered significant attention. However, during training, these generative recommenders focus only on maximizing the prediction probability of the next target item in the temporal sequence, while neglecting awareness of diverse plausible potential items.Although introducing large language models (LLMs) with world knowledge and adding a set of auxiliary tasks that can link item identifiers to their real-world meanings can alleviate this issue, the high inference costs associated with these LLM-based recommenders make them challenging to deploy in practical scenarios. In this paper, we propose a novel learning framework, LOHRec, which leverages the order and hierarchy in generative recommendation using quantized identifiers to further explore the performance ceiling of lightweight generative recommenders. Under fair comparisons with approximate backbone parameter sizes, comprehensive experiments show that all variants of generative recommenders using our framework outperform strong prior baselines across multiple datasets. Furthermore, we empirically demonstrate that LOHRec can efficiently align lightweight generative recommenders with LLM recommendation preferences in low-resource scenarios, further demonstrating its practical utility. Our code repository is available at [https://github.com/xjw-nlp/LOHRec](https://github.com/xjw-nlp/LOHRec).

2016

Chinese poetry generation is a very challenging task in natural language processing. In this paper, we propose a novel two-stage poetry generating method which first plans the sub-topics of the poem according to the user’s writing intent, and then generates each line of the poem sequentially, using a modified recurrent neural network encoder-decoder framework. The proposed planning-based method can ensure that the generated poem is coherent and semantically consistent with the user’s intent. A comprehensive evaluation with human judgments demonstrates that our proposed approach outperforms the state-of-the-art poetry generating methods and the poem quality is somehow comparable to human poets.