Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions
David Acuna, Ximing Lu, Jaehun Jung, Hyunwoo Kim, Amlan Kar, Sanja Fidler, Yejin Choi
Abstract
Recent research in vision-language models (VLMs) has centered around the possibility of equipping them with implicit long-form chain-of-thought reasoning—akin to the success observed in language models—via distillation and reinforcement learning. But what about the non-reasoning models already trained and deployed across the internet? Should we simply abandon them, or is there hope for a search mechanism that can elicit hidden knowledge and induce long reasoning traces— without any additional training or supervision? In this paper, we explore this possibility using a Monte Carlo Tree Search (MCTS)-inspired algorithm, which injects subquestion–subanswer pairs into the model’s output stream. We show that framing reasoning as a search process—where subquestions act as latent decisions within a broader inference trajectory—helps the model “connect the dots” between fragmented knowledge and produce extended reasoning traces in non-reasoning models. We evaluate our method across three benchmarks and observe consistent improvements. Notably, our approach yields a 2% overall improvement on MMMU-PRO, including a significant 9% gain in Liberal Arts.- Anthology ID:
- 2025.emnlp-main.1230
- Volume:
- Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2025
- Address:
- Suzhou, China
- Editors:
- Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 24158–24171
- Language:
- URL:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1230/
- DOI:
- Cite (ACL):
- David Acuna, Ximing Lu, Jaehun Jung, Hyunwoo Kim, Amlan Kar, Sanja Fidler, and Yejin Choi. 2025. Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 24158–24171, Suzhou, China. Association for Computational Linguistics.
- Cite (Informal):
- Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions (Acuna et al., EMNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1230.pdf