Large Language Models can Share Images, Too!

Young-Jun Lee; Dokyong Lee; Joo-won Sung; Jonghwan Hyeon; Ho-Jin Choi

doi:10.18653/v1/2024.findings-acl.39

Large Language Models can Share Images, Too!

Young-Jun Lee, Dokyong Lee, Joo Won Sung, Jonghwan Hyeon, Ho-Jin Choi

Abstract

This paper explores the image-sharing capability of Large Language Models (LLMs), such as GPT-4 and LLaMA 2, in a zero-shot setting. To facilitate a comprehensive evaluation of LLMs, we introduce the photochatplus dataset, which includes enriched annotations (ie intent, triggering sentence, image description, and salient information). Furthermore, we present the gradient-free and extensible Decide, Describe, and Retrieve () framework. With extensive experiments, we unlock the image-sharing capability of equipped with LLMs in zero-shot prompting, with ChatGPT achieving the best performance.Our findings also reveal the emergent image-sharing ability in LLMs under zero-shot conditions, validating the effectiveness of . We use this framework to demonstrate its practicality and effectiveness in two real-world scenarios: (1) human-bot interaction and (2) dataset augmentation. To the best of our knowledge, this is the first study to assess the image-sharing ability of various LLMs in a zero-shot setting. We make our source code and dataset publicly available at https://github.com/passing2961/DribeR.

Anthology ID:: 2024.findings-acl.39
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 692–713
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-acl.39/
DOI:: 10.18653/v1/2024.findings-acl.39
Bibkey:
Cite (ACL):: Young-Jun Lee, Dokyong Lee, Joo Won Sung, Jonghwan Hyeon, and Ho-Jin Choi. 2024. Large Language Models can Share Images, Too!. In Findings of the Association for Computational Linguistics: ACL 2024, pages 692–713, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Large Language Models can Share Images, Too! (Lee et al., Findings 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2024.findings-acl.39.pdf

PDF Cite Search Fix data