Debating for Better Reasoning in Vision-Language Models

Ashutosh Adhikari; Mirella Lapata

doi:10.18653/v1/2025.findings-emnlp.853

Debating for Better Reasoning in Vision-Language Models

Abstract

As Large Language Models (LLMs) gain expertise across diverse domains and modalities, scalable oversight becomes increasingly challenging, particularly when their capabilities may surpass human evaluators. Debate has emerged as a promising mechanism for enabling such oversight. We extend the debate paradigm to a multimodal setting, exploring its potential for blind models to supervise and enhance the performance of sighted ones. We focus on visual question answering (VQA), where two “sighted” expert vision-language models debate an answer, while a “blind” (text-only) judge adjudicates based solely on the quality of the arguments. In our framework, the experts only defend answers aligned with their beliefs, thereby obviating the need for explicit role-playing and concentrating the debate on instances of expert disagreement. Experiments on several multimodal tasks demonstrate that the debate framework consistently outperforms individual expert models. Moreover, judgments from blind LLMs can be used to instil reasoning capabilities in vision-language models through fine-tuning.

Anthology ID:: 2025.findings-emnlp.853
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15766–15784
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.853/
DOI:: 10.18653/v1/2025.findings-emnlp.853
Bibkey:
Cite (ACL):: Ashutosh Adhikari and Mirella Lapata. 2025. Debating for Better Reasoning in Vision-Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 15766–15784, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Debating for Better Reasoning in Vision-Language Models (Adhikari & Lapata, Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.853.pdf
Checklist:: 2025.findings-emnlp.853.checklist.pdf

PDF Cite Search Checklist Fix data