Answering Cross-Dimensional Geometric Visual Questions by Multi-constraint Spatial Reasoning
Dongling Li, Qi Chen, Jianxing Yu, Hanjiang Lai, Yanghui Rao, Wenqing Chen, Jian Yin
Abstract
This paper focuses on the task of answering complex visual questions that involve cross-dimensional (like 2D to 3D) spatial reasoning. This task (called SpatialQA) can enhance the machine’s spatial cognitive abilities in "plane representation - space reconstruction - semantic inference," having great application value. Existing methods often only recognize 1-D visual objects and relations, but they lack the ability to represent in a cross-dimensional space and fail to grasp structured geometric knowledge such as face-face topology and texture details. That would cause problems such as texture misalignment and topological confusion, leading to error accumulation and incorrect answers. To address this problem, we propose a new method with good cross-dimensional reasoning capabilities. In detail, we first analyze the input image, capturing its relations in the 2D plane. To derive the topological relations in the 3D space, we employ a dual-channel augmentation technique to retrieve topological isomorphic examples and geometric rules, supplementing the missing but crucial reasoning clues. We then design a multi-perspective verifier to find the inconsistencies of the macroscopic outlines, eliminating incorrect options. Based on visual clues, we develop a question-guided detector to analyze the texture details and relations of each surface finely, capturing inconsistencies in a micro level. That can correct the reasoning bias to derive the right answer. Moreover, we create a large-scale dataset with 22,483 samples to conduct evaluations. The results show the effectiveness of our method.- Anthology ID:
- 2026.findings-acl.1656
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 33093–33111
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1656/
- DOI:
- Cite (ACL):
- Dongling Li, Qi Chen, Jianxing Yu, Hanjiang Lai, Yanghui Rao, Wenqing Chen, and Jian Yin. 2026. Answering Cross-Dimensional Geometric Visual Questions by Multi-constraint Spatial Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 33093–33111, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Answering Cross-Dimensional Geometric Visual Questions by Multi-constraint Spatial Reasoning (Li et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1656.pdf