Sparse Frame Grouping Network with Action Centered for Untrimmed Video Paragraph Captioning
Guorui Yu, Yimin Hu, Yuejie Zhang, Rui Feng, Tao Zhang, Shang Gao
Abstract
Generating paragraph captions for untrimmed videos without event annotations is challenging, especially when aiming to enhance precision and minimize repetition at the same time. To address this challenge, we propose a module called Sparse Frame Grouping (SFG). It dynamically groups event information with the help of action information for the entire video and excludes redundant frames within pre-defined clips. To enhance the performance, an Intra Contrastive Learning technique is designed to align the SFG module with the core event content in the paragraph, and an Inter Contrastive Learning technique is employed to learn action-guided context with reduced static noise simultaneously. Extensive experiments are conducted on two benchmark datasets (ActivityNet Captions and YouCook2). Results demonstrate that SFG outperforms the state-of-the-art methods on all metrics.- Anthology ID:
- 2023.findings-emnlp.970
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 14571–14580
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.970
- DOI:
- 10.18653/v1/2023.findings-emnlp.970
- Cite (ACL):
- Guorui Yu, Yimin Hu, Yuejie Zhang, Rui Feng, Tao Zhang, and Shang Gao. 2023. Sparse Frame Grouping Network with Action Centered for Untrimmed Video Paragraph Captioning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14571–14580, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Sparse Frame Grouping Network with Action Centered for Untrimmed Video Paragraph Captioning (Yu et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2023.findings-emnlp.970.pdf