Stable Evidence, Unstable Decisions: An Empirical Analysis of Model Decision Stability in Vision–Language Models
Ali Khoramfar, Mohammad Javad Dousti, Alireza Mohamadian, Heshaam Faili
Abstract
VLMs provide visual information alongside their predictions, but it remains unclear whether consistency in such information implies consistent decisions. We study this question in a controlled medical-imaging setting using brain MRI with pathology-confirmed labels and expert lesion annotations. For each human subject and modality, we construct configurations that retain the lesion content while varying surrounding context and scale and measure decision flips together with consistency in model-reported influential slices. Across four diverse VLMs (including proprietary, open-source, and domain-specific models), flip rates reach up to 75% across lesion-containing presentations, often despite high overlap in reported evidence. When lesion-related content is removed, proprietary models rarely produce a categorical diagnosis, with abstention rates ranging from 63% to 99%. These results reveal a mismatch between reported evidence and decisions, motivating evaluation beyond accuracy. Our evaluation dataset is publicly available on Hugging Face.- Anthology ID:
- 2026.findings-acl.1303
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 26153–26166
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1303/
- DOI:
- Cite (ACL):
- Ali Khoramfar, Mohammad Javad Dousti, Alireza Mohamadian, and Heshaam Faili. 2026. Stable Evidence, Unstable Decisions: An Empirical Analysis of Model Decision Stability in Vision–Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 26153–26166, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Stable Evidence, Unstable Decisions: An Empirical Analysis of Model Decision Stability in Vision–Language Models (Khoramfar et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.findings-acl.1303.pdf