VENUS: A VLLM-driven Video Content Discovery System for Real Application Scenarios

Minyi Zhao; Yi Liu; Jianfeng Wen; Boshen Zhang; Hailang Chang; Zhiheng Ouyang; Jie Wang; Wensong He; Shuigeng Zhou

VENUS: A VLLM-driven Video Content Discovery System for Real Application Scenarios

Minyi Zhao, Yi Liu, Jianfeng Wen, Boshen Zhang, Hailang Chang, Zhiheng Ouyang, Jie Wang, Wensong He, Shuigeng Zhou

Abstract

Video Content Discovery (VCD) is to identify the specific videos defined by a certain pre-specified text policy (or constraint), which plays a crucial role in building a healthy and high-quality Web content ecology. Currently, related works typically employ multiple classifiers or similarity-based systems to support VCD. However, these approaches are difficult to manage, lack generalization power, and suffer from low performance. To tackle these problems, this paper presents a new Vision-Language Large Model (VLLM)-driven VCD system called VENUS (the abbreviation of Video contENt UnderStander). Concretely, we first develop an automatic policy-guided sequential annotator (APSA) to generate high-quality, VCD-specific, and reasoning-equipped instruct-tuning data for model training, then extend the VLLM inference to support VCD better. Following that, we construct a real VCD test set called VCD-Bench, which includes a total of 13 policies and 57K videos. Furthermore, to evaluate its practical efficacy, we deploy VENUS in three different real scenarios. Extensive experiments on both the VCD-Bench and public evaluation datasets for various VCD-related tasks demonstrate the superiority of VENUS over existing baselines.

Anthology ID:: 2025.emnlp-industry.4
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: November
Year:: 2025
Address:: Suzhou (China)
Editors:: Saloni Potdar, Lina Rojas-Barahona, Sebastien Montella
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 50–64
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.4/
DOI:
Bibkey:
Cite (ACL):: Minyi Zhao, Yi Liu, Jianfeng Wen, Boshen Zhang, Hailang Chang, Zhiheng Ouyang, Jie Wang, Wensong He, and Shuigeng Zhou. 2025. VENUS: A VLLM-driven Video Content Discovery System for Real Application Scenarios. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 50–64, Suzhou (China). Association for Computational Linguistics.
Cite (Informal):: VENUS: A VLLM-driven Video Content Discovery System for Real Application Scenarios (Zhao et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-industry.4.pdf

PDF Cite Search Fix data