Dieter Pfoser


2025

pdf bib
Follow the Beaten Path: The Role of Route Patterns on Vision-Language Navigation Agents Generalization Abilities
Kourosh T Baghaei | Dieter Pfoser | Antonios Anastasopoulos
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Vision and language navigation (VLN) is a challenging task towards the creation of embodied agents that requires spatial and temporal reasoning over the instructions provided in natural language and aligning them with the visual perception of an environment. Although a number of methods and approaches have been developed, none achieves human level performance in outdoor settings (by up to 75 percent). The contributions of visual and language modalities to the success of VLN have been studied, however here we focus on an overlooked property of routes and show that navigational instructions can be represented as patterns of actions that also describe trajectory shapes. Through carefully crafted experiments, we show that agents generalization to unseen environments depends not only on visual and linguistic features, but also on the shape of trajectories presented to the model during the fine-tuning. Our experiments show that the diversity of patterns of actions during training is a key contributor to high success rates for agents. Last, we propose a solution based on data augmentation that fills the gap in missing patterns of training data. Our findings will guide researchers towards improved practices in the development and evaluation of VLN datasets and agents.