Abstract
Relevance ranking system plays a crucial role in video search on streaming platforms. Most relevance ranking methods focus on text modality, incapable of fully exploiting cross-modal cues present in video. Recent multi-modal models have demonstrated promise in various vision-language tasks but provide limited help for downstream query-video relevance tasks due to the discrepency between relevance ranking-agnostic pre-training objectives and the real video search scenarios that demand comprehensive relevance modeling. To address these challenges, we propose a QUery-Aware pre-training model with multi-modaLITY (QUALITY) that incorporates hard-mined query information as alignment targets and utilizes video tag information for guidance. QUALITY is integrated into our relevance ranking model, which leverages multi-modal knowledge and improves ranking optimization method based on ordinal regression. Extensive experiments show our proposed model significantly enhances video search performance.- Anthology ID:
- 2023.emnlp-industry.31
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Mingxuan Wang, Imed Zitouni
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 322–330
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2023.emnlp-industry.31/
- DOI:
- 10.18653/v1/2023.emnlp-industry.31
- Cite (ACL):
- Chengcan Ye, Ting Peng, Tim Chang, Zhiyi Zhou, and Feng Wang. 2023. Query-aware Multi-modal based Ranking Relevance in Video Search. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 322–330, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Query-aware Multi-modal based Ranking Relevance in Video Search (Ye et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2023.emnlp-industry.31.pdf