Abstract
We present a model for locating regions in space based on natural language descriptions. Starting with a 3D scene and a sentence, our model is able to associate words in the sentence with regions in the scene, interpret relations such as ‘on top of’ or ‘next to,’ and finally locate the region described in the sentence. All components form a single neural network that is trained end-to-end without prior knowledge of object segmentation. To evaluate our model, we construct and release a new dataset consisting of Minecraft scenes with crowdsourced natural language descriptions. We achieve a 32% relative error reduction compared to a strong neural baseline.- Anthology ID:
- D17-1015
- Volume:
- Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
- Month:
- September
- Year:
- 2017
- Address:
- Copenhagen, Denmark
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 157–166
- Language:
- URL:
- https://aclanthology.org/D17-1015
- DOI:
- 10.18653/v1/D17-1015
- Cite (ACL):
- Nikita Kitaev and Dan Klein. 2017. Where is Misty? Interpreting Spatial Descriptors by Modeling Regions in Space. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 157–166, Copenhagen, Denmark. Association for Computational Linguistics.
- Cite (Informal):
- Where is Misty? Interpreting Spatial Descriptors by Modeling Regions in Space (Kitaev & Klein, EMNLP 2017)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/D17-1015.pdf
- Code
- nikitakit/voxelworld
- Data
- CLEVR