GRIZAL: Generative Prior-guided Zero-Shot Temporal Action Localization
Onkar Kishor Susladkar, Gayatri Sudhir Deshmukh, Vandan Gorade, Sparsh Mittal
Abstract
Zero-shot temporal action localization (TAL) aims to temporally localize actions in videos without prior training examples. To address the challenges of TAL, we offer GRIZAL, a model that uses multimodal embeddings and dynamic motion cues to localize actions effectively. GRIZAL achieves sample diversity by using large-scale generative models such as GPT-4 for generating textual augmentations and DALL-E for generating image augmentations. Our model integrates vision-language embeddings with optical flow insights, optimized through a blend of supervised and self-supervised loss functions. On ActivityNet, Thumos14 and Charades-STA datasets, GRIZAL greatly outperforms state-of-the-art zero-shot TAL models, demonstrating its robustness and adaptability across a wide range of video content. We will make all the models and code publicly available by open-sourcing them.- Anthology ID:
- 2024.emnlp-main.1061
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 19046–19059
- Language:
- URL:
- https://aclanthology.org/2024.emnlp-main.1061
- DOI:
- 10.18653/v1/2024.emnlp-main.1061
- Cite (ACL):
- Onkar Kishor Susladkar, Gayatri Sudhir Deshmukh, Vandan Gorade, and Sparsh Mittal. 2024. GRIZAL: Generative Prior-guided Zero-Shot Temporal Action Localization. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 19046–19059, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- GRIZAL: Generative Prior-guided Zero-Shot Temporal Action Localization (Susladkar et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.emnlp-main.1061.pdf