Changbo Wang

2026

K-GIP: Diagnosing Logical Fractures in Large Vision-Language Models via Verification Scene Graphs and Sequential Pruning
Yujun Hu | Xiaoyu Zhou | Changbo Wang | Gaoqi He
Findings of the Association for Computational Linguistics: ACL 2026

Diagnosing fine-grained hallucinations in Large Vision-Language Models (LVLMs) can greatly advance their reliable deployment in real-world applications. Nevertheless, current benchmarks predominantly employ flat metrics that treat errors in isolation, leaving a gap in evaluating the complex causal dependencies between visual perception and textual reasoning. Motivated by this, we introduce the Knowledge-Guided In-Context Probing (K-GIP) framework to fill this gap. Specifically, K-GIP constructs a high-fidelity dual-perception ground truth to transform abstract priors into multi-granularity queries. Furthermore, we propose a Verification Scene Graph metric equipped with a Sequential Logic Pruning protocol, which explicitly models existence-attribute dependencies to strictly penalize logical fractures. We conduct comprehensive evaluations of mainstream LVLMs across three datasets using K-GIP. The experimental results highlight that our methodology successfully isolates deep reasoning failures from simple perceptual misses. We hope K-GIP can serve as a valuable and rigorous standard to assess logical robustness in multimodal systems.

Co-authors

Venues

Findings1

Fix author