Abstract
Recent large pre-trained models have achieved strong performance in multimodal language generation, which requires a joint effort of vision and language modeling. However, most previous generation tasks are based on single image input and produce short text descriptions that are not grounded on the input images. In this work, we propose a shared task on visually grounded story generation. The input is an image sequence, and the output is a story that is conditioned on the input images. This task is particularly challenging because: 1) the protagonists in the generated stories need to be grounded in the images and 2) the output story should be a coherent long-form text. We aim to advance the study of vision-based story generation by accepting submissions that propose new methods as well as new evaluation measures.- Anthology ID:
- 2023.inlg-genchal.3
- Volume:
- Proceedings of the 16th International Natural Language Generation Conference: Generation Challenges
- Month:
- September
- Year:
- 2023
- Address:
- Prague, Czechia
- Editor:
- Simon Mille
- Venues:
- INLG | SIGDIAL
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 17–22
- Language:
- URL:
- https://aclanthology.org/2023.inlg-genchal.3
- DOI:
- Cite (ACL):
- Xudong Hong, Khushboo Mehra, Asad Sayeed, and Vera Demberg. 2023. Visually Grounded Story Generation Challenge. In Proceedings of the 16th International Natural Language Generation Conference: Generation Challenges, pages 17–22, Prague, Czechia. Association for Computational Linguistics.
- Cite (Informal):
- Visually Grounded Story Generation Challenge (Hong et al., INLG-SIGDIAL 2023)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2023.inlg-genchal.3.pdf