Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models

Manas Jhalani; Annervaz K M; Pushpak Bhattacharyya

Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models

Manas Jhalani, Annervaz K M, Pushpak Bhattacharyya

Abstract

In the realm of multimodal tasks, Visual Question Answering (VQA) plays a crucial role by addressing natural language questions grounded in visual content. Knowledge-Based Visual Question Answering (KBVQA) advances this concept by adding external knowledge along with images to respond to questions. We introduce an approach for KBVQA, augmenting the existing vision-language transformer encoder-decoder (OFA) model . Our main contribution involves enhancing questions by incorporating relevant external knowledge extracted from knowledge graphs, using a dynamic triple extraction

Anthology ID:: 2024.icon-1.3
Volume:: Proceedings of the 21st International Conference on Natural Language Processing (ICON)
Month:: December
Year:: 2024
Address:: AU-KBC Research Centre, Chennai, India
Editors:: Sobha Lalitha Devi, Karunesh Arora
Venue:: ICON
SIG:
Publisher:: NLP Association of India (NLPAI)
Note:
Pages:: 21–36
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2024.icon-1.3/
DOI:
Bibkey:
Cite (ACL):: Manas Jhalani, Annervaz K M, and Pushpak Bhattacharyya. 2024. Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models. In Proceedings of the 21st International Conference on Natural Language Processing (ICON), pages 21–36, AU-KBC Research Centre, Chennai, India. NLP Association of India (NLPAI).
Cite (Informal):: Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models (Jhalani et al., ICON 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2024.icon-1.3.pdf

PDF Cite Search Fix data