RoboFailRing: Retrieval-Augmented and Language Grounding Failure Detection for VLM-enabled Robotic Manipulation

Chenduo Ying, Linkang Du, Yuanchao Shu, Peng Cheng


Abstract
Reliable failure detection and causal reasoning are critical in robotic manipulation, as their absence risks robot damage and endangers human safety.Although recent Vision–Language Models (VLMs) are employed to attempt failure detection and causality reasoning, they typically make retrospective assessment only after task completion, and their reasoning accuracy is often limited.To address these issues, we introduce RoboFailRing, which enables timely failure detection during task execution and enhances the reasoning accuracy of VLMs.It achieves rapid failure detection by retrieving a pre-constructed failure memory and returning a similarity-based decision.In addition, by providing grounded failure report to VLMs, it improves the accuracy of their reasoning about the failure causes and repair strategies.We evaluate RoboFailRing on two large-scale simulated datasets comprising over 6,000 failure trajectories and covering 81 distinct manipulation tasks.The results show that the average success rate of out-of-distribution failure detection reaches 80%, while the mean detection time is cut to roughly 50% of the baseline.Moreover, evaluations on real-world systems show an average 35% gain in VLM failure-reasoning accuracy.We make our code publicly available at: https://github.com/DynamicPoet/RoboFailRing.
Anthology ID:
2026.acl-long.602
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13188–13202
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.602/
DOI:
Bibkey:
Cite (ACL):
Chenduo Ying, Linkang Du, Yuanchao Shu, and Peng Cheng. 2026. RoboFailRing: Retrieval-Augmented and Language Grounding Failure Detection for VLM-enabled Robotic Manipulation. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13188–13202, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
RoboFailRing: Retrieval-Augmented and Language Grounding Failure Detection for VLM-enabled Robotic Manipulation (Ying et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.602.pdf
Checklist:
 2026.acl-long.602.checklist.pdf