Follow the Beaten Path: The Role of Route Patterns on Vision-Language Navigation Agents Generalization Abilities

Kourosh T Baghaei, Dieter Pfoser, Antonios Anastasopoulos


Abstract
Vision and language navigation (VLN) is a challenging task towards the creation of embodied agents that requires spatial and temporal reasoning over the instructions provided in natural language and aligning them with the visual perception of an environment. Although a number of methods and approaches have been developed, none achieves human level performance in outdoor settings (by up to 75 percent). The contributions of visual and language modalities to the success of VLN have been studied, however here we focus on an overlooked property of routes and show that navigational instructions can be represented as patterns of actions that also describe trajectory shapes. Through carefully crafted experiments, we show that agents generalization to unseen environments depends not only on visual and linguistic features, but also on the shape of trajectories presented to the model during the fine-tuning. Our experiments show that the diversity of patterns of actions during training is a key contributor to high success rates for agents. Last, we propose a solution based on data augmentation that fills the gap in missing patterns of training data. Our findings will guide researchers towards improved practices in the development and evaluation of VLN datasets and agents.
Anthology ID:
2025.naacl-long.406
Volume:
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7986–8005
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.naacl-long.406/
DOI:
Bibkey:
Cite (ACL):
Kourosh T Baghaei, Dieter Pfoser, and Antonios Anastasopoulos. 2025. Follow the Beaten Path: The Role of Route Patterns on Vision-Language Navigation Agents Generalization Abilities. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7986–8005, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
Follow the Beaten Path: The Role of Route Patterns on Vision-Language Navigation Agents Generalization Abilities (Baghaei et al., NAACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.naacl-long.406.pdf