Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective

Meiqi Chen, Yixin Cao, Yan Zhang, Chaochao Lu


Abstract
Recent advancements in Large Language Models (LLMs) have facilitated the development of Multimodal LLMs (MLLMs). Despite their impressive capabilities, MLLMs often suffer from over-reliance on unimodal biases (e.g., language bias and vision bias), leading to incorrect answers in complex multimodal tasks. To investigate this issue, we propose a causal framework to interpret the biases in Visual Question Answering (VQA) problems. Within this framework, we conduct an in-depth causal analysis to assess the causal effect of these biases on MLLM predictions. Based on the analysis, we introduce 1) a novel MORE dataset with 12,000 challenging VQA instances requiring multi-hop reasoning and overcoming unimodal biases. 2) a causality-enhanced agent framework CAVE that guides models to comprehensively integrate information from different modalities and mitigate biases. Our experiments show that MLLMs perform poorly on MORE, indicating strong unimodal biases and limited semantic understanding. However, when integrated with our CAVE, promising improvements in reasoning and bias mitigation can be seen. These findings provide important insights for the development of more robust MLLMs and contribute to the broader goal of advancing multimodal AI systems capable of deeper understanding and reasoning. Our project page is at https://github.com/OpenCausaLab/MORE.
Anthology ID:
2024.findings-emnlp.960
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16449–16469
Language:
URL:
https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2024.findings-emnlp.960/
DOI:
10.18653/v1/2024.findings-emnlp.960
Bibkey:
Cite (ACL):
Meiqi Chen, Yixin Cao, Yan Zhang, and Chaochao Lu. 2024. Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 16449–16469, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective (Chen et al., Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/Add-Cong-Liu-Florida-Atlantic-University-author-id/2024.findings-emnlp.960.pdf
Data:
 2024.findings-emnlp.960.data.zip