Answering Cross-Dimensional Geometric Visual Questions by Multi-constraint Spatial Reasoning

Dongling Li; Qi Chen; Jianxing Yu; Hanjiang Lai; Yanghui Rao; Wenqing Chen; Jian Yin

Answering Cross-Dimensional Geometric Visual Questions by Multi-constraint Spatial Reasoning

Dongling Li, Qi Chen, Jianxing Yu, Hanjiang Lai, Yanghui Rao, Wenqing Chen, Jian Yin

Abstract

This paper focuses on the task of answering complex visual questions that involve cross-dimensional (like 2D to 3D) spatial reasoning. This task (called SpatialQA) can enhance the machine’s spatial cognitive abilities in "plane representation - space reconstruction - semantic inference," having great application value. Existing methods often only recognize 1-D visual objects and relations, but they lack the ability to represent in a cross-dimensional space and fail to grasp structured geometric knowledge such as face-face topology and texture details. That would cause problems such as texture misalignment and topological confusion, leading to error accumulation and incorrect answers. To address this problem, we propose a new method with good cross-dimensional reasoning capabilities. In detail, we first analyze the input image, capturing its relations in the 2D plane. To derive the topological relations in the 3D space, we employ a dual-channel augmentation technique to retrieve topological isomorphic examples and geometric rules, supplementing the missing but crucial reasoning clues. We then design a multi-perspective verifier to find the inconsistencies of the macroscopic outlines, eliminating incorrect options. Based on visual clues, we develop a question-guided detector to analyze the texture details and relations of each surface finely, capturing inconsistencies in a micro level. That can correct the reasoning bias to derive the right answer. Moreover, we create a large-scale dataset with 22,483 samples to conduct evaluations. The results show the effectiveness of our method.

Anthology ID:: 2026.findings-acl.1656
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 33093–33111
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1656/
DOI:
Bibkey:
Cite (ACL):: Dongling Li, Qi Chen, Jianxing Yu, Hanjiang Lai, Yanghui Rao, Wenqing Chen, and Jian Yin. 2026. Answering Cross-Dimensional Geometric Visual Questions by Multi-constraint Spatial Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 33093–33111, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Answering Cross-Dimensional Geometric Visual Questions by Multi-constraint Spatial Reasoning (Li et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1656.pdf
Checklist:: 2026.findings-acl.1656.checklist.pdf

PDF Cite Search Checklist Fix data