@inproceedings{ilinykh-etal-2018-task,
    title = "The Task Matters: Comparing Image Captioning and Task-Based Dialogical Image Description",
    author = "Ilinykh, Nikolai  and
      Zarrie{\ss}, Sina  and
      Schlangen, David",
    editor = "Krahmer, Emiel  and
      Gatt, Albert  and
      Goudbeek, Martijn",
    booktitle = "Proceedings of the 11th International Conference on Natural Language Generation",
    month = nov,
    year = "2018",
    address = "Tilburg University, The Netherlands",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/iwcs-25-ingestion/W18-6547/",
    doi = "10.18653/v1/W18-6547",
    pages = "397--402",
    abstract = "Image captioning models are typically trained on data that is collected from people who are asked to describe an image, without being given any further task context. As we argue here, this context independence is likely to cause problems for transferring to task settings in which image description is bound by task demands. We demonstrate that careful design of data collection is required to obtain image descriptions which are contextually bounded to a particular meta-level task. As a task, we use MeetUp!, a text-based communication game where two players have the goal of finding each other in a visual environment. To reach this goal, the players need to describe images representing their current location. We analyse a dataset from this domain and show that the nature of image descriptions found in MeetUp! is diverse, dynamic and rich with phenomena that are not present in descriptions obtained through a simple image captioning task, which we ran for comparison."
}Markdown (Informal)
[The Task Matters: Comparing Image Captioning and Task-Based Dialogical Image Description](https://preview.aclanthology.org/iwcs-25-ingestion/W18-6547/) (Ilinykh et al., INLG 2018)
ACL