Chengcan Ye
2023
Query-aware Multi-modal based Ranking Relevance in Video Search
Chengcan Ye
|
Ting Peng
|
Tim Chang
|
Zhiyi Zhou
|
Feng Wang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Relevance ranking system plays a crucial role in video search on streaming platforms. Most relevance ranking methods focus on text modality, incapable of fully exploiting cross-modal cues present in video. Recent multi-modal models have demonstrated promise in various vision-language tasks but provide limited help for downstream query-video relevance tasks due to the discrepency between relevance ranking-agnostic pre-training objectives and the real video search scenarios that demand comprehensive relevance modeling. To address these challenges, we propose a QUery-Aware pre-training model with multi-modaLITY (QUALITY) that incorporates hard-mined query information as alignment targets and utilizes video tag information for guidance. QUALITY is integrated into our relevance ranking model, which leverages multi-modal knowledge and improves ranking optimization method based on ordinal regression. Extensive experiments show our proposed model significantly enhances video search performance.