Diverse and Relevant Visual Storytelling with Scene Graph Embeddings
Xudong Hong | Rakshith Shetty | Asad Sayeed | Khushboo Mehra | Vera Demberg | Bernt Schiele
Proceedings of the 24th Conference on Computational Natural Language Learning
A problem in automatically generated stories for image sequences is that they use overly generic vocabulary and phrase structure and fail to match the distributional characteristics of human-generated text. We address this problem by introducing explicit representations for objects and their relations by extracting scene graphs from the images. Utilizing an embedding of this scene graph enables our model to more explicitly reason over objects and their relations during story generation, compared to the global features from an object classifier used in previous work. We apply metrics that account for the diversity of words and phrases of generated stories as well as for reference to narratively-salient image features and show that our approach outperforms previous systems. Our experiments also indicate that our models obtain competitive results on reference-based metrics.