Wensong He
2025
VENUS: A VLLM-driven Video Content Discovery System for Real Application Scenarios
Minyi Zhao
|
Yi Liu
|
Jianfeng Wen
|
Boshen Zhang
|
Hailang Chang
|
Zhiheng Ouyang
|
Jie Wang
|
Wensong He
|
Shuigeng Zhou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Video Content Discovery (VCD) is to identify the specific videos defined by a certain pre-specified text policy (or constraint), which plays a crucial role in building a healthy and high-quality Web content ecology. Currently, related works typically employ multiple classifiers or similarity-based systems to support VCD. However, these approaches are difficult to manage, lack generalization power, and suffer from low performance. To tackle these problems, this paper presents a new Vision-Language Large Model (VLLM)-driven VCD system called VENUS (the abbreviation of Video contENt UnderStander). Concretely, we first develop an automatic policy-guided sequential annotator (APSA) to generate high-quality, VCD-specific, and reasoning-equipped instruct-tuning data for model training, then extend the VLLM inference to support VCD better. Following that, we construct a real VCD test set called VCD-Bench, which includes a total of 13 policies and 57K videos. Furthermore, to evaluate its practical efficacy, we deploy VENUS in three different real scenarios. Extensive experiments on both the VCD-Bench and public evaluation datasets for various VCD-related tasks demonstrate the superiority of VENUS over existing baselines.
Search
Fix author
Co-authors
- Hailang Chang 1
- Yi Liu 1
- Zhiheng Ouyang 1
- Jie Wang 1
- Jianfeng Wen 1
- show all...