Language-to-Space Programming for Training-Free 3D Visual Grounding

Boyu Mi; Hanqing Wang; Tai Wang; Yilun Chen; Jiangmiao Pang

Language-to-Space Programming for Training-Free 3D Visual Grounding

Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, Jiangmiao Pang

Abstract

3D visual grounding (3DVG) is challenging due to the need to understand 3D spatial relations. While supervised approaches have achieved superior performance, they are constrained by the scarcity and high annotation costs of 3D vision-language datasets. Training-free approaches based on LLMs/VLMs eliminate the need for large-scale training data, but they either incur prohibitive grounding time and token costs or have unsatisfactory accuracy. To address the challenges, we introduce a novel method for training-free 3D visual grounding, namely **La**nguage-to-**S**pace **P**rogramming (LaSP). LaSP introduces LLM-generated codes to analyze 3D spatial relations among objects, along with a pipeline that evaluates and optimizes the codes automatically. Experimental results demonstrate that LaSP achieves 52.9% accuracy on the Nr3D benchmark, ranking among the best training-free methods. Moreover, it substantially reduces the grounding time and token costs, offering a balanced trade-off between performance and efficiency.

Anthology ID:: 2025.emnlp-main.191
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3844–3864
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.191/
DOI:
Bibkey:
Cite (ACL):: Boyu Mi, Hanqing Wang, Tai Wang, Yilun Chen, and Jiangmiao Pang. 2025. Language-to-Space Programming for Training-Free 3D Visual Grounding. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 3844–3864, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Language-to-Space Programming for Training-Free 3D Visual Grounding (Mi et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.191.pdf
Checklist:: 2025.emnlp-main.191.checklist.pdf

PDF Cite Search Checklist Fix data