Once Correct, Still Wrong: Counterfactual Hallucination in Multilingual Vision-Language Models

Basel Mousi, Fahim Dalvi, Shammur Absar Chowdhury, Firoj Alam, Nadir Durrani


Abstract
Vision–language models (VLMs) can achieve high accuracy while still accepting **culturally plausible but visually incorrect** interpretations. Existing hallucination benchmarks rarely test this failure mode, particularly outside Western contexts and English. We introduce **M2CQA**, a culturally grounded multimodal benchmark built from images spanning 17 MENA countries, paired with contrastive true and counterfactual statements in English, Arabic, and its dialects. To isolate hallucination beyond raw accuracy, we propose the **CounterFactual Hallucination Rate (CFHR)**, which measures counterfactual acceptance conditioned on correctly answering the true statement. Evaluating state-of-the-art VLMs under multiple prompting strategies, we find that CFHR rises sharply in Arabic, especially in dialects, even when true-statement accuracy remains high.Moreover, reasoning-first prompting consistently increases counterfactual hallucination, while answering before justifying improves robustness. We make the dataset publicly available for the community (https://huggingface.co/datasets/QCRI/M2CQA)).
Anthology ID:
2026.findings-acl.234
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4763–4788
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.234/
DOI:
Bibkey:
Cite (ACL):
Basel Mousi, Fahim Dalvi, Shammur Absar Chowdhury, Firoj Alam, and Nadir Durrani. 2026. Once Correct, Still Wrong: Counterfactual Hallucination in Multilingual Vision-Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 4763–4788, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Once Correct, Still Wrong: Counterfactual Hallucination in Multilingual Vision-Language Models (Mousi et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.234.pdf
Checklist:
 2026.findings-acl.234.checklist.pdf