Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization

Minghang Zheng; Shaogang Gong; Hailin Jin; Yuxin Peng; Yang Liu (刘扬)

doi:10.18653/v1/2023.acl-long.794

Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization

Minghang Zheng, Shaogang Gong, Hailin Jin, Yuxin Peng, Yang Liu

Abstract

Video sentence localization aims to locate moments in an unstructured video according to a given natural language query. A main challenge is the expensive annotation costs and the annotation bias. In this work, we study video sentence localization in a zero-shot setting, which learns with only video data without any annotation. Existing zero-shot pipelines usually generate event proposals and then generate a pseudo query for each event proposal. However, their event proposals are obtained via visual feature clustering, which is query-independent and inaccurate; and the pseudo-queries are short or less interpretable. Moreover, existing approaches ignores the risk of pseudo-label noise when leveraging them in training. To address the above problems, we propose a Structure-based Pseudo Label generation (SPL), which first generate free-form interpretable pseudo queries before constructing query-dependent event proposals by modeling the event temporal structure. To mitigate the effect of pseudo-label noise, we propose a noise-resistant iterative method that repeatedly re-weight the training sample based on noise estimation to train a grounding model and correct pseudo labels. Experiments on the ActivityNet Captions and Charades-STA datasets demonstrate the advantages of our approach. Code can be found at https://github.com/minghangz/SPL.

Anthology ID:: 2023.acl-long.794
Volume:: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14197–14209
Language:
URL:: https://preview.aclanthology.org/nschneid-patch-2/2023.acl-long.794/
DOI:: 10.18653/v1/2023.acl-long.794
Bibkey:
Cite (ACL):: Minghang Zheng, Shaogang Gong, Hailin Jin, Yuxin Peng, and Yang Liu. 2023. Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14197–14209, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization (Zheng et al., ACL 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-2/2023.acl-long.794.pdf
Video:: https://preview.aclanthology.org/nschneid-patch-2/2023.acl-long.794.mp4

PDF Cite Search Video Fix data