Visual–Linguistic Abductive Reasoning with LLMs for Knowledge-based Visual Question Answering

Jieun Kim; Yujin Jeong; Sung-Bae Cho

Visual–Linguistic Abductive Reasoning with LLMs for Knowledge-based Visual Question Answering

Abstract

Recent attempts to leverage large language models (LLMs) for reasoning and pre-trained knowledge in multi-modal reasoning focus on two main approaches: aligning image features with linguistic space, and converting images into textual cues to exploit the implicit reasoning capabilities of LLMs. Although they integrate visual information into the reasoning pipeline, they often treat visual perception and language reasoning as separate processes, limiting the potential for fully unified multi-modal reasoning. In this paper, we propose a novel method, Visual–Linguistic Abductive Reasoning (ViLA), inspired by human abductive reasoning processes. ViLA hypothesizes a plausible answer, generates the corresponding visual and textual premises, and employs fuzzy scoring to select the most coherent combination, thus deriving the final inference. This process integrates visual and linguistic modalities into interpretable abductive reasoning chains, enabling unified multi-modal reasoning. Without fine-tuning LLMs or retrieving external knowledge, ViLA improves performance by 2.31% on AOKVQA, 1.7% on OKVQA, and 1.7% on GQA over previous state-of-the-art models, while also improving interpretability and stability.

Anthology ID:: 2026.findings-eacl.343
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6529–6544
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.343/
DOI:
Bibkey:
Cite (ACL):: Jieun Kim, Yujin Jeong, and Sung-Bae Cho. 2026. Visual–Linguistic Abductive Reasoning with LLMs for Knowledge-based Visual Question Answering. In Findings of the Association for Computational Linguistics: EACL 2026, pages 6529–6544, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Visual–Linguistic Abductive Reasoning with LLMs for Knowledge-based Visual Question Answering (Kim et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.343.pdf
Checklist:: 2026.findings-eacl.343.checklist.pdf

PDF Cite Search Checklist Fix data