Representation Learning for Grounded Spatial Reasoning

Michael Janner, Karthik Narasimhan, Regina Barzilay


Abstract
The interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive rewards. The proposed model learns a representation of the world steered by instruction text. This design allows for precise alignment of local neighborhoods with corresponding verbalizations, while also handling global references in the instructions. We train our model with reinforcement learning using a variant of generalized value iteration. The model outperforms state-of-the-art approaches on several metrics, yielding a 45% reduction in goal localization error.
Anthology ID:
Q18-1004
Volume:
Transactions of the Association for Computational Linguistics, Volume 6
Month:
Year:
2018
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Kristina Toutanova, Brian Roark
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
49–61
Language:
URL:
https://aclanthology.org/Q18-1004
DOI:
10.1162/tacl_a_00004
Bibkey:
Cite (ACL):
Michael Janner, Karthik Narasimhan, and Regina Barzilay. 2018. Representation Learning for Grounded Spatial Reasoning. Transactions of the Association for Computational Linguistics, 6:49–61.
Cite (Informal):
Representation Learning for Grounded Spatial Reasoning (Janner et al., TACL 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/Q18-1004.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-2/Q18-1004.mp4
Code
 JannerM/spatial-reasoning