Yiheng Zhao
2025
Is OpenVLA Truly Robust? A Systematic Evaluation of Positional Robustness
Yiran Pang
|
Yiheng Zhao
|
Zhuopu Zhou
|
Tingkai Hu
|
Ranxin Hou
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Pretrained language and vision-language models have become core components in building vision-language-action models (VLAs) due to their strong spatial reasoning capabilities. Evaluating the robustness of VLAs is crucial to ensuring their reliability in practical scenarios. Although prior work has focused on background and environment robustness, positional robustness remains underexplored. In this paper, we propose a comprehensive evaluation protocol to assess the positional robustness of VLAs and apply it to OpenVLA, an open-source, high-performing, and efficient model well suited for real-world deployment. We find that OpenVLA succeeds only when the target object is placed at one of the two positions encountered during training. Even in these cases, the success rate never exceeds 50% because it exhibits a memorized behavior that it randomly executes a grasping action toward one of the two fixed positions without relying on perception to localize the target object. This reveals that OpenVLA’s positional robustness is extremely weak.