Promptception: How Sensitive Are Large Multimodal Models to Prompts?

Mohamed Insaf Ismithdeen, Muhammad Uzair Khattak, Salman Khan


Abstract
Despite the success of Large Multimodal Models (LMMs) in recent years, prompt design for LMMs in Multiple‐Choice Question Answering (MCQA) remains poorly understood. We show that even minor variations in prompt phrasing and structure can lead to accuracy deviations of up to 15% for certain prompts and models. This variability poses a challenge for transparent and fair LMM evaluation, as models often report their best-case performance using carefully selected prompts. To address this, we introduce **Promptception**, a systematic framework for evaluating prompt sensitivity in LMMs. It consists of 61 prompt types, spanning 15 categories and 6 supercategories, each targeting specific aspects of prompt formulation, and is used to evaluate 10 LMMs ranging from lightweight open‐source models to GPT-4o and Gemini 1.5 Pro, across 3 MCQA benchmarks: MMStar, MMMU‐Pro, MVBench. Our findings reveal that proprietary models exhibit greater sensitivity to prompt phrasing, reflecting tighter alignment with instruction semantics, while open‐source models are steadier but struggle with nuanced and complex phrasing. Based on this analysis, we propose Prompting Principles tailored to proprietary and open-source LMMs, enabling more robust and fair model evaluation.
Anthology ID:
2025.findings-emnlp.1302
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23950–23985
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1302/
DOI:
10.18653/v1/2025.findings-emnlp.1302
Bibkey:
Cite (ACL):
Mohamed Insaf Ismithdeen, Muhammad Uzair Khattak, and Salman Khan. 2025. Promptception: How Sensitive Are Large Multimodal Models to Prompts?. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 23950–23985, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Promptception: How Sensitive Are Large Multimodal Models to Prompts? (Ismithdeen et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1302.pdf
Checklist:
 2025.findings-emnlp.1302.checklist.pdf