Hao Zhou

Other people with similar names: Hao Zhou, Hao Zhou, Hao Zhou

Unverified author pages with similar names: Hao Zhou

2026

MAVIS: Multi-Agent Video Retrieval via Structured Video Understanding
Jie Zhang | Qilang Ye | Hao Zhou | Haochen Liang | Fei Luo
Findings of the Association for Computational Linguistics: ACL 2026

The dominant paradigm in video retrieval relies on embedding-based full-corpus scanning, which suffers from inherent computational inefficiency and the semantic asymmetry between information-dense videos and sparse textual queries. To bridge this gap, we introduce **MAVIS**, a novel multi-agent framework that rethinks retrieval as cooperative reasoning rather than brute-force search. MAVIS first bridges the granularity mismatch by parsing raw videos into a **Structured Semantic Library**, enabling explicit attribute-level indexing. During retrieval, a planner decomposes complex user intents into atomic sub-tasks, dispatching specialized agents to independently nominate candidates. Crucially, MAVIS employs a **Logic-aware Debate** mechanism with a strict veto protocol, where agents collaboratively prune logical mismatches to identify a compact set of "controversial” candidates for fine-grained verification. This agentic workflow effectively bypasses the inefficiency of full-library traversal. Extensive experiments on MSR-VTT, MSVD, and ActivityNet demonstrate that MAVIS achieves competitive performance without task-specific fine-tuning, offering a scalable and interpretable alternative to traditional dual-encoder approaches.

Co-authors

Venues

Findings1

Fix author