Alireza Mohamadian
2026
Stable Evidence, Unstable Decisions: An Empirical Analysis of Model Decision Stability in Vision–Language Models
Ali Khoramfar | Mohammad Javad Dousti | Alireza Mohamadian | Heshaam Faili
Findings of the Association for Computational Linguistics: ACL 2026
Ali Khoramfar | Mohammad Javad Dousti | Alireza Mohamadian | Heshaam Faili
Findings of the Association for Computational Linguistics: ACL 2026
VLMs provide visual information alongside their predictions, but it remains unclear whether consistency in such information implies consistent decisions. We study this question in a controlled medical-imaging setting using brain MRI with pathology-confirmed labels and expert lesion annotations. For each human subject and modality, we construct configurations that retain the lesion content while varying surrounding context and scale and measure decision flips together with consistency in model-reported influential slices. Across four diverse VLMs (including proprietary, open-source, and domain-specific models), flip rates reach up to 75% across lesion-containing presentations, often despite high overlap in reported evidence. When lesion-related content is removed, proprietary models rarely produce a categorical diagnosis, with abstention rates ranging from 63% to 99%. These results reveal a mismatch between reported evidence and decisions, motivating evaluation beyond accuracy. Our evaluation dataset is publicly available on Hugging Face.