Grounding Symbols in Multi-Modal Instructions
Yordan Hristov, Svetlin Penkov, Alex Lascarides, Subramanian Ramamoorthy
Abstract
As robots begin to cohabit with humans in semi-structured environments, the need arises to understand instructions involving rich variability—for instance, learning to ground symbols in the physical world. Realistically, this task must cope with small datasets consisting of a particular users’ contextual assignment of meaning to terms. We present a method for processing a raw stream of cross-modal input—i.e., linguistic instructions, visual perception of a scene and a concurrent trace of 3D eye tracking fixations—to produce the segmentation of objects with a correspondent association to high-level concepts. To test our framework we present experiments in a table-top object manipulation scenario. Our results show our model learns the user’s notion of colour and shape from a small number of physical demonstrations, generalising to identifying physical referents for novel combinations of the words.- Anthology ID:
 - W17-2807
 - Volume:
 - Proceedings of the First Workshop on Language Grounding for Robotics
 - Month:
 - August
 - Year:
 - 2017
 - Address:
 - Vancouver, Canada
 - Editors:
 - Mohit Bansal, Cynthia Matuszek, Jacob Andreas, Yoav Artzi, Yonatan Bisk
 - Venue:
 - RoboNLP
 - SIG:
 - Publisher:
 - Association for Computational Linguistics
 - Note:
 - Pages:
 - 49–57
 - Language:
 - URL:
 - https://aclanthology.org/W17-2807
 - DOI:
 - 10.18653/v1/W17-2807
 - Cite (ACL):
 - Yordan Hristov, Svetlin Penkov, Alex Lascarides, and Subramanian Ramamoorthy. 2017. Grounding Symbols in Multi-Modal Instructions. In Proceedings of the First Workshop on Language Grounding for Robotics, pages 49–57, Vancouver, Canada. Association for Computational Linguistics.
 - Cite (Informal):
 - Grounding Symbols in Multi-Modal Instructions (Hristov et al., RoboNLP 2017)
 - PDF:
 - https://preview.aclanthology.org/ingest-acl-2023-videos/W17-2807.pdf