Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning

Mohit Vaishnav; Tanel Tammet

Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning

Abstract

Vision–language models (VLMs) often fail on abstract visual reasoning benchmarks such as Bongard problems, raising the question of whether the main bottleneck lies in reasoning or representation. We study this on Bongard-LOGO, a synthetic benchmark of abstract concept learning with ground-truth generative programs, by comparing end-to-end VLMs on raw images with large language models (LLMs) given symbolic inputs derived from those images. Using symbolic inputs as a diagnostic probe rather than a practical multimodal architecture, our Componential–Grammatical (C–G) paradigm reformulates Bongard-LOGO as a symbolic reasoning task based on LOGO-style action programs or structured descriptions. LLMs achieve large and consistent gains, reaching mid–90s accuracy on Free-form problems, while a strong visual baseline remains near chance under matched task definitions. Ablations on input format, explicit concept prompts, and minimal visual grounding show that these factors matter much less than the shift from pixels to symbolic structure. These results identify representation as a key bottleneck in abstract visual reasoning and show how symbolic input can serve as a controlled diagnostic upper bound.

Anthology ID:: 2026.conll-main.19
Volume:: Proceedings of the 30th Conference on Computational Natural Language Learning
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Claire Bonial, Yevgeni Berzak
Venues:: CoNLL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 318–343
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.19/
DOI:
Bibkey:
Cite (ACL):: Mohit Vaishnav and Tanel Tammet. 2026. Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning. In Proceedings of the 30th Conference on Computational Natural Language Learning, pages 318–343, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Symbolic Grounding Reveals Representational Bottlenecks in Abstract Visual Reasoning (Vaishnav & Tammet, CoNLL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.conll-main.19.pdf

PDF Cite Search Fix data