Enhancing Partially Relevant Video Retrieval with Robust Alignment Learning

Long Zhang (张龙); Peipei Song; Jianfeng Dong; Kun Li; Xun Yang

doi:10.18653/v1/2025.findings-emnlp.248

Enhancing Partially Relevant Video Retrieval with Robust Alignment Learning

Long Zhang, Peipei Song, Jianfeng Dong, Kun Li, Xun Yang

Abstract

Partially Relevant Video Retrieval (PRVR) aims to retrieve untrimmed videos partially relevant to a given query. The core challenge lies in learning robust query-video alignment against spurious semantic correlations arising from inherent data uncertainty: 1) query ambiguity, where the query incompletely characterizes the target video and often contains uninformative tokens, and 2) partial video relevance, where abundant query-irrelevant segments introduce contextual noise in cross-modal alignment. Existing methods often focus on enhancing multi-scale clip representations and retrieving the most relevant clip. However, the inherent data uncertainty in PRVR renders them vulnerable to distractor videos with spurious similarities, leading to suboptimal performance. To fill this research gap, we propose Robust Alignment Learning (RAL) framework, which explicitly models the uncertainty in data. Key innovations include: 1) we pioneer probabilistic modeling for PRVR by encoding videos and queries as multivariate Gaussian distributions. This not only quantifies data uncertainty but also enables proxy-level matching to capture the variability in cross-modal correspondences; 2) we consider the heterogeneous informativeness of query words and introduce learnable confidence gates to dynamically weight similarity. As a plug-and-play solution, RAL can be seamlessly integrated into the existing architectures. Extensive experiments across diverse retrieval backbones demonstrate its effectiveness.

Anthology ID:: 2025.findings-emnlp.248
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4615–4629
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.248/
DOI:: 10.18653/v1/2025.findings-emnlp.248
Bibkey:
Cite (ACL):: Long Zhang, Peipei Song, Jianfeng Dong, Kun Li, and Xun Yang. 2025. Enhancing Partially Relevant Video Retrieval with Robust Alignment Learning. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 4615–4629, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Enhancing Partially Relevant Video Retrieval with Robust Alignment Learning (Zhang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.248.pdf
Checklist:: 2025.findings-emnlp.248.checklist.pdf

PDF Cite Search Checklist Fix data