Human-in-the-loop Robotic Grasping Using BERT Scene Representation

Yaoxian Song, Penglei Sun, Pengfei Fang, Linyi Yang, Yanghua Xiao, Yue Zhang


Abstract
Current NLP techniques have been greatly applied in different domains. In this paper, we propose a human-in-the-loop framework for robotic grasping in cluttered scenes, investigating a language interface to the grasping process, which allows the user to intervene by natural language commands. This framework is constructed on a state-of-the-art grasping baseline, where we substitute a scene-graph representation with a text representation of the scene using BERT. Experiments on both simulation and physical robot show that the proposed method outperforms conventional object-agnostic and scene-graph based methods in the literature. In addition, we find that with human intervention, performance can be significantly improved. Our dataset and code are available on our project website https://sites.google.com/view/hitl-grasping-bert.
Anthology ID:
2022.coling-1.265
Volume:
Proceedings of the 29th International Conference on Computational Linguistics
Month:
October
Year:
2022
Address:
Gyeongju, Republic of Korea
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
2992–3006
Language:
URL:
https://aclanthology.org/2022.coling-1.265
DOI:
Bibkey:
Cite (ACL):
Yaoxian Song, Penglei Sun, Pengfei Fang, Linyi Yang, Yanghua Xiao, and Yue Zhang. 2022. Human-in-the-loop Robotic Grasping Using BERT Scene Representation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2992–3006, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
Cite (Informal):
Human-in-the-loop Robotic Grasping Using BERT Scene Representation (Song et al., COLING 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.coling-1.265.pdf