Where is Misty? Interpreting Spatial Descriptors by Modeling Regions in Space

Nikita Kitaev, Dan Klein


Abstract
We present a model for locating regions in space based on natural language descriptions. Starting with a 3D scene and a sentence, our model is able to associate words in the sentence with regions in the scene, interpret relations such as ‘on top of’ or ‘next to,’ and finally locate the region described in the sentence. All components form a single neural network that is trained end-to-end without prior knowledge of object segmentation. To evaluate our model, we construct and release a new dataset consisting of Minecraft scenes with crowdsourced natural language descriptions. We achieve a 32% relative error reduction compared to a strong neural baseline.
Anthology ID:
D17-1015
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
157–166
Language:
URL:
https://aclanthology.org/D17-1015
DOI:
10.18653/v1/D17-1015
Bibkey:
Cite (ACL):
Nikita Kitaev and Dan Klein. 2017. Where is Misty? Interpreting Spatial Descriptors by Modeling Regions in Space. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 157–166, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Where is Misty? Interpreting Spatial Descriptors by Modeling Regions in Space (Kitaev & Klein, EMNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/D17-1015.pdf
Video:
 https://vimeo.com/238230308
Code
 nikitakit/voxelworld
Data
CLEVR