Dense Procedure Captioning in Narrated Instructional Videos
Botian Shi, Lei Ji, Yaobo Liang, Nan Duan, Peng Chen, Zhendong Niu, Ming Zhou
Abstract
Understanding narrated instructional videos is important for both research and real-world web applications. Motivated by video dense captioning, we propose a model to generate procedure captions from narrated instructional videos which are a sequence of step-wise clips with description. Previous works on video dense captioning learn video segments and generate captions without considering transcripts. We argue that transcripts in narrated instructional videos can enhance video representation by providing fine-grained complimentary and semantic textual information. In this paper, we introduce a framework to (1) extract procedures by a cross-modality module, which fuses video content with the entire transcript; and (2) generate captions by encoding video frames as well as a snippet of transcripts within each extracted procedure. Experiments show that our model can achieve state-of-the-art performance in procedure extraction and captioning, and the ablation studies demonstrate that both the video frames and the transcripts are important for the task.- Anthology ID:
- P19-1641
- Volume:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Anna Korhonen, David Traum, Lluís Màrquez
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6382–6391
- Language:
- URL:
- https://aclanthology.org/P19-1641
- DOI:
- 10.18653/v1/P19-1641
- Cite (ACL):
- Botian Shi, Lei Ji, Yaobo Liang, Nan Duan, Peng Chen, Zhendong Niu, and Ming Zhou. 2019. Dense Procedure Captioning in Narrated Instructional Videos. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6382–6391, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Dense Procedure Captioning in Narrated Instructional Videos (Shi et al., ACL 2019)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/P19-1641.pdf