Learning to Ask Denotative and Connotative Questions for Knowledge-based VQA

Xiaoying Xing, Peixi Xiong, Lei Fan, Yunxuan Li, Ying Wu


Abstract
Large language models (LLMs) have attracted increasing attention due to its prominent performance on various tasks. Recent works seek to leverage LLMs on knowledge-based visual question answering (VQA) tasks which require common sense knowledge to answer the question about an image, since LLMs have obtained rich knowledge from large-scale training. Several methods have proposed to leverage frozen LLMs by converting visual information to textual prompts. However, how to efficiently exploit the knowledge of LLMs and bridge the disconnects between visual information and language models remain open problems. In this paper, we propose to let LLMs learn to ask (L2A) informative questions to collect essential visual information. We introduce the concepts of denotation and connotation to promote image and question understanding and provide a clear guidance with respect to the objective of question generation. In this way, the model can better capture the associations between different concepts, as well as efficiently collect both explicit information and implicit relevant information that contribute to the final answer. The experiments demonstrate that our proposed method achieves consistent performance on various knowledge-based VQA datasets.
Anthology ID:
2024.findings-emnlp.487
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8301–8315
Language:
URL:
https://preview.aclanthology.org/icon-24-ingestion/2024.findings-emnlp.487/
DOI:
10.18653/v1/2024.findings-emnlp.487
Bibkey:
Cite (ACL):
Xiaoying Xing, Peixi Xiong, Lei Fan, Yunxuan Li, and Ying Wu. 2024. Learning to Ask Denotative and Connotative Questions for Knowledge-based VQA. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 8301–8315, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Learning to Ask Denotative and Connotative Questions for Knowledge-based VQA (Xing et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/icon-24-ingestion/2024.findings-emnlp.487.pdf