@inproceedings{xing-etal-2024-learning,
    title = "Learning to Ask Denotative and Connotative Questions for Knowledge-based {VQA}",
    author = "Xing, Xiaoying  and
      Xiong, Peixi  and
      Fan, Lei  and
      Li, Yunxuan  and
      Wu, Ying",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2024.findings-emnlp.487/",
    doi = "10.18653/v1/2024.findings-emnlp.487",
    pages = "8301--8315",
    abstract = "Large language models (LLMs) have attracted increasing attention due to its prominent performance on various tasks. Recent works seek to leverage LLMs on knowledge-based visual question answering (VQA) tasks which require common sense knowledge to answer the question about an image, since LLMs have obtained rich knowledge from large-scale training. Several methods have proposed to leverage frozen LLMs by converting visual information to textual prompts. However, how to efficiently exploit the knowledge of LLMs and bridge the disconnects between visual information and language models remain open problems. In this paper, we propose to let LLMs learn to ask (L2A) informative questions to collect essential visual information. We introduce the concepts of denotation and connotation to promote image and question understanding and provide a clear guidance with respect to the objective of question generation. In this way, the model can better capture the associations between different concepts, as well as efficiently collect both explicit information and implicit relevant information that contribute to the final answer. The experiments demonstrate that our proposed method achieves consistent performance on various knowledge-based VQA datasets."
}Markdown (Informal)
[Learning to Ask Denotative and Connotative Questions for Knowledge-based VQA](https://preview.aclanthology.org/ingest-emnlp/2024.findings-emnlp.487/) (Xing et al., Findings 2024)
ACL