GHAN: Graph-Based Hierarchical Aggregation Network for Text-Video Retrieval

Yahan Yu; Bojie Hu; Yu Li (李豫)

doi:10.18653/v1/2022.emnlp-main.374

GHAN: Graph-Based Hierarchical Aggregation Network for Text-Video Retrieval

Abstract

Text-video retrieval focuses on two aspects: cross-modality interaction and video-language encoding. Currently, the mainstream approach is to train a joint embedding space for multimodal interactions. However, there are structural and semantic differences between text and video, making this approach challenging for fine-grained understanding. In order to solve this, we propose an end-to-end graph-based hierarchical aggregation network for text-video retrieval according to the hierarchy possessed by text and video. We design a token-level weighted network to refine intra-modality representations and construct a graph-based message passing attention network for global-local alignment across modality. We conduct experiments on the public datasets MSR-VTT-9K, MSR-VTT-7K and MSVD, and achieve Recall@1 of 73.0%, 65.6%, and 64.0% , which is 25.7%, 16.5%, and 14.2% better than the current state-of-the-art model.

Anthology ID:: 2022.emnlp-main.374
Volume:: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5547–5557
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2022.emnlp-main.374/
DOI:: 10.18653/v1/2022.emnlp-main.374
Bibkey:
Cite (ACL):: Yahan Yu, Bojie Hu, and Yu Li. 2022. GHAN: Graph-Based Hierarchical Aggregation Network for Text-Video Retrieval. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5547–5557, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: GHAN: Graph-Based Hierarchical Aggregation Network for Text-Video Retrieval (Yu et al., EMNLP 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2022.emnlp-main.374.pdf

PDF Cite Search Fix data