Query-aware Multi-modal based Ranking Relevance in Video Search

Chengcan Ye; Ting Peng; Tim Chang; Zhiyi Zhou; Feng Wang

doi:10.18653/v1/2023.emnlp-industry.31

Query-aware Multi-modal based Ranking Relevance in Video Search

Chengcan Ye, Ting Peng, Tim Chang, Zhiyi Zhou, Feng Wang

Abstract

Relevance ranking system plays a crucial role in video search on streaming platforms. Most relevance ranking methods focus on text modality, incapable of fully exploiting cross-modal cues present in video. Recent multi-modal models have demonstrated promise in various vision-language tasks but provide limited help for downstream query-video relevance tasks due to the discrepency between relevance ranking-agnostic pre-training objectives and the real video search scenarios that demand comprehensive relevance modeling. To address these challenges, we propose a QUery-Aware pre-training model with multi-modaLITY (QUALITY) that incorporates hard-mined query information as alignment targets and utilizes video tag information for guidance. QUALITY is integrated into our relevance ranking model, which leverages multi-modal knowledge and improves ranking optimization method based on ordinal regression. Extensive experiments show our proposed model significantly enhances video search performance.

Anthology ID:: 2023.emnlp-industry.31
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Mingxuan Wang, Imed Zitouni
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 322–330
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.emnlp-industry.31/
DOI:: 10.18653/v1/2023.emnlp-industry.31
Bibkey:
Cite (ACL):: Chengcan Ye, Ting Peng, Tim Chang, Zhiyi Zhou, and Feng Wang. 2023. Query-aware Multi-modal based Ranking Relevance in Video Search. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 322–330, Singapore. Association for Computational Linguistics.
Cite (Informal):: Query-aware Multi-modal based Ranking Relevance in Video Search (Ye et al., EMNLP 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.emnlp-industry.31.pdf
Video:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2023.emnlp-industry.31.mp4

PDF Cite Search Video Fix data