Abstract
Automatically describing videos in natural language is an ambitious problem, which could bridge our understanding of vision and language. We propose a hierarchical approach, by first generating video descriptions as sequences of simple sentences, followed at the next level by a more complex and fluent description in natural language. While the simple sentences describe simple actions in the form of (subject, verb, object), the second-level paragraph descriptions, indirectly using information from the first-level description, presents the visual content in a more compact, coherent and semantically rich manner. To this end, we introduce the first video dataset in the literature that is annotated with captions at two levels of linguistic complexity. We perform extensive tests that demonstrate that our hierarchical linguistic representation, from simple to complex language, allows us to train a two-stage network that is able to generate significantly more complex paragraphs than current one-stage approaches.- Anthology ID:
- 2020.coling-main.220
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Donia Scott, Nuria Bel, Chengqing Zong
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 2436–2447
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.220
- DOI:
- 10.18653/v1/2020.coling-main.220
- Cite (ACL):
- Simion-Vlad Bogolin, Ioana Croitoru, and Marius Leordeanu. 2020. A hierarchical approach to vision-based language generation: from simple sentences to complex natural language. In Proceedings of the 28th International Conference on Computational Linguistics, pages 2436–2447, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- A hierarchical approach to vision-based language generation: from simple sentences to complex natural language (Bogolin et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2020.coling-main.220.pdf