Abstract
Image captioning models are typically trained on data that is collected from people who are asked to describe an image, without being given any further task context. As we argue here, this context independence is likely to cause problems for transferring to task settings in which image description is bound by task demands. We demonstrate that careful design of data collection is required to obtain image descriptions which are contextually bounded to a particular meta-level task. As a task, we use MeetUp!, a text-based communication game where two players have the goal of finding each other in a visual environment. To reach this goal, the players need to describe images representing their current location. We analyse a dataset from this domain and show that the nature of image descriptions found in MeetUp! is diverse, dynamic and rich with phenomena that are not present in descriptions obtained through a simple image captioning task, which we ran for comparison.- Anthology ID:
- W18-6547
- Volume:
- Proceedings of the 11th International Conference on Natural Language Generation
- Month:
- November
- Year:
- 2018
- Address:
- Tilburg University, The Netherlands
- Editors:
- Emiel Krahmer, Albert Gatt, Martijn Goudbeek
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 397–402
- Language:
- URL:
- https://aclanthology.org/W18-6547
- DOI:
- 10.18653/v1/W18-6547
- Cite (ACL):
- Nikolai Ilinykh, Sina Zarrieß, and David Schlangen. 2018. The Task Matters: Comparing Image Captioning and Task-Based Dialogical Image Description. In Proceedings of the 11th International Conference on Natural Language Generation, pages 397–402, Tilburg University, The Netherlands. Association for Computational Linguistics.
- Cite (Informal):
- The Task Matters: Comparing Image Captioning and Task-Based Dialogical Image Description (Ilinykh et al., INLG 2018)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/W18-6547.pdf