Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization
Minghang Zheng, Shaogang Gong, Hailin Jin, Yuxin Peng, Yang Liu
Abstract
Video sentence localization aims to locate moments in an unstructured video according to a given natural language query. A main challenge is the expensive annotation costs and the annotation bias. In this work, we study video sentence localization in a zero-shot setting, which learns with only video data without any annotation. Existing zero-shot pipelines usually generate event proposals and then generate a pseudo query for each event proposal. However, their event proposals are obtained via visual feature clustering, which is query-independent and inaccurate; and the pseudo-queries are short or less interpretable. Moreover, existing approaches ignores the risk of pseudo-label noise when leveraging them in training. To address the above problems, we propose a Structure-based Pseudo Label generation (SPL), which first generate free-form interpretable pseudo queries before constructing query-dependent event proposals by modeling the event temporal structure. To mitigate the effect of pseudo-label noise, we propose a noise-resistant iterative method that repeatedly re-weight the training sample based on noise estimation to train a grounding model and correct pseudo labels. Experiments on the ActivityNet Captions and Charades-STA datasets demonstrate the advantages of our approach. Code can be found at https://github.com/minghangz/SPL.- Anthology ID:
- 2023.acl-long.794
- Volume:
- Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2023
- Address:
- Toronto, Canada
- Editors:
- Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 14197–14209
- Language:
- URL:
- https://aclanthology.org/2023.acl-long.794
- DOI:
- 10.18653/v1/2023.acl-long.794
- Cite (ACL):
- Minghang Zheng, Shaogang Gong, Hailin Jin, Yuxin Peng, and Yang Liu. 2023. Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14197–14209, Toronto, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization (Zheng et al., ACL 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2023.acl-long.794.pdf