Abstract
Vision-and-Language Navigation (VLN) task involves navigating mobility using linguistic commands and has application in developing interfaces for autonomous mobility. In reality, natural human communication also encompasses non-verbal cues like hand gestures and gaze. These gesture-guided instructions have been explored in Human-Robot Interaction systems for effective interaction, particularly in object-referring expressions. However, a notable gap exists in tackling gesture-based demonstrative expressions in outdoor VLN task. To address this, we introduce a novel dataset for gesture-guided outdoor VLN instructions with demonstrative expressions, designed with a focus on complex instructions requiring multi-hop reasoning between the multiple input modalities. In addition, our work also includes a comprehensive analysis of the collected data and a comparative evaluation against the existing datasets.- Anthology ID:
- 2024.eacl-srw.23
- Volume:
- Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian’s, Malta
- Editors:
- Neele Falk, Sara Papi, Mike Zhang
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 290–295
- Language:
- URL:
- https://aclanthology.org/2024.eacl-srw.23
- DOI:
- Cite (ACL):
- Aman Jain, Teruhisa Misu, Kentaro Yamada, and Hitomi Yanaka. 2024. GesNavi: Gesture-guided Outdoor Vision-and-Language Navigation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 290–295, St. Julian’s, Malta. Association for Computational Linguistics.
- Cite (Informal):
- GesNavi: Gesture-guided Outdoor Vision-and-Language Navigation (Jain et al., EACL 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2024.eacl-srw.23.pdf