2025
pdf
bib
abs
Mixed Signals: Decoding VLMs’ Reasoning and Underlying Bias in Vision-Language Conflict
Pouya Pezeshkpour
|
Moin Aminnaseri
|
Estevam Hruschka
Findings of the Association for Computational Linguistics: EMNLP 2025
Vision-language models (VLMs) have demonstrated impressive performance by effectively integrating visual and textual information to solve complex tasks. However, it is not clear how these models reason over the visual and textual data together, nor how the flow of information between modalities is structured. In this paper, we examine how VLMs reason by analyzing their biases when confronted with scenarios that present conflicting image and text cues—a common occurrence in real-world applications. To uncover the extent and nature of these biases, we build upon existing benchmarks to create five datasets containing mismatched image-text pairs, covering topics in mathematics, science, and visual descriptions. Our analysis shows that VLMs favor text in simpler queries but shift toward images as query complexity increases. This bias correlates with model scale, with the difference between the percentage of image- and text-preferred responses ranging from +56.8% (image favored) to -85.1% (text favored), depending on the task and model. In addition, we explore three mitigation strategies: simple prompt modifications, modifications that explicitly instruct models on how to handle conflicting information (akin to chain-of-thought prompting), and a task decomposition strategy that analyzes each modality separately before combining their results. Our findings indicate that the effectiveness of these strategies in identifying and mitigating bias varies significantly and is closely linked to the model’s overall performance on the task and the specific modality in question. We released our dataset and code.
2021
pdf
bib
abs
ParsiNLU: A Suite of Language Understanding Challenges for Persian
Daniel Khashabi
|
Arman Cohan
|
Siamak Shakeri
|
Pedram Hosseini
|
Pouya Pezeshkpour
|
Malihe Alikhani
|
Moin Aminnaseri
|
Marzieh Bitaab
|
Faeze Brahman
|
Sarik Ghazarian
|
Mozhdeh Gheini
|
Arman Kabiri
|
Rabeeh Karimi Mahabadi
|
Omid Memarrast
|
Ahmadreza Mosallanezhad
|
Erfan Noury
|
Shahab Raji
|
Mohammad Sadegh Rasooli
|
Sepideh Sadeghi
|
Erfan Sadeqi Azer
|
Niloofar Safi Samghabadi
|
Mahsa Shafaei
|
Saber Sheybani
|
Ali Tazarv
|
Yadollah Yaghoobzadeh
Transactions of the Association for Computational Linguistics, Volume 9
Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1