Abstract
Information need of humans is essentially multimodal in nature, enabling maximum exploitation of situated context. We introduce a dataset for sequential procedural (how-to) text generation from images in cooking domain. The dataset consists of 16,441 cooking recipes with 160,479 photos associated with different steps. We setup a baseline motivated by the best performing model in terms of human evaluation for the Visual Story Telling (ViST) task. In addition, we introduce two models to incorporate high level structure learnt by a Finite State Machine (FSM) in neural sequential generation process by: (1) Scaffolding Structure in Decoder (SSiD) (2) Scaffolding Structure in Loss (SSiL). Our best performing model (SSiL) achieves a METEOR score of 0.31, which is an improvement of 0.6 over the baseline model. We also conducted human evaluation of the generated grounded recipes, which reveal that 61% found that our proposed (SSiL) model is better than the baseline model in terms of overall recipes. We also discuss analysis of the output highlighting key important NLP issues for prospective directions.- Anthology ID:
- P19-1606
- Volume:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Month:
- July
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Anna Korhonen, David Traum, Lluís Màrquez
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 6040–6046
- Language:
- URL:
- https://aclanthology.org/P19-1606
- DOI:
- 10.18653/v1/P19-1606
- Cite (ACL):
- Khyathi Chandu, Eric Nyberg, and Alan W Black. 2019. Storyboarding of Recipes: Grounded Contextual Generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6040–6046, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Storyboarding of Recipes: Grounded Contextual Generation (Chandu et al., ACL 2019)
- PDF:
- https://preview.aclanthology.org/teach-a-man-to-fish/P19-1606.pdf
- Data
- VIST