Do Multimodal LLMs Understand Order? Measuring the Fragility of Multimodal Reasoning under Input Order Perturbations

Sheng-Lun Wei; Yu-Ling Liao; Hen-Hsen Huang; Hsin-Hsi Chen

Do Multimodal LLMs Understand Order? Measuring the Fragility of Multimodal Reasoning under Input Order Perturbations

Sheng-Lun Wei, Yu-Ling Liao, Hen-Hsen Huang, Hsin-Hsi Chen

Abstract

Multimodal reasoning has progressed rapidly with large vision-language models (LVLMs), yet their robustness under input variations remains underexplored. This study investigates positional bias in LVLMs for multimodal multiple-choice questions. Our analysis shows that model predictions are sensitive to both choice and modality ordering. We conduct a large-scale evaluation on MMMU, CVQA, and MMBench using fourteen representative models. Further analysis examines how question properties, including difficulty, domain, and image type, affect robustness. We also assess whether text-based mitigation strategies transfer to the VQA setting and perform ablation studies on self-consistency and reasoning complexity. Overall, our findings provide the first comprehensive understanding of positional bias from a vision-language perspective, highlighting key challenges in achieving stable multimodal reasoning.

Anthology ID:: 2026.lrec-main.716
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 9118–9128
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.716/
DOI:
Bibkey:
Cite (ACL):: Sheng-Lun Wei, Yu-Ling Liao, Hen-Hsen Huang, and Hsin-Hsi Chen. 2026. Do Multimodal LLMs Understand Order? Measuring the Fragility of Multimodal Reasoning under Input Order Perturbations. International Conference on Language Resources and Evaluation, main:9118–9128.
Cite (Informal):: Do Multimodal LLMs Understand Order? Measuring the Fragility of Multimodal Reasoning under Input Order Perturbations (Wei et al., LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.716.pdf

PDF Cite Search Fix data