Non-Event Oriented Video Assessments in Long-Form Robot Videos
Stephanie M. Lukin, Kimberly A. Pollard, Claire Bonial, Cory J. Hayes, Ron Artstein, Kallirroi Georgila, David Traum
Abstract
We introduce Video-SCOUT, a novel dataset of sixty 20-minute robot-recorded videos from human-robot collaborative exploration exercises, together with a new video analysis method for these types of exploration videos. Unlike video from stationary cameras where detection of motion can help identify events of interest, the camera in an exploration task is constantly in motion while the environment is stationary. Our analysis method—Non-Event Oriented Video Assessments (NOVA)—uses vision-language models to select frames relevant for supporting a particular assessment within continuous long-form videos. Results of testing with two different video-language models reveals a trade-off in precision and recall, and exhibits gains in overall recall when combined with a human’s knowledge, suggesting that NOVA may improve a human analysis of robot-navigation. We outline future work to mitigate miscommunication in human-robot interaction by leveraging dialogue with NOVA in support of better collaboration.- Anthology ID:
- 2026.magmar-main.8
- Volume:
- Proceedings of the 2nd Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2026)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, USA
- Editors:
- Kenton Murray, Reno Kriz
- Venues:
- MAGMaR | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 27–41
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.magmar-main.8/
- DOI:
- Cite (ACL):
- Stephanie M. Lukin, Kimberly A. Pollard, Claire Bonial, Cory J. Hayes, Ron Artstein, Kallirroi Georgila, and David Traum. 2026. Non-Event Oriented Video Assessments in Long-Form Robot Videos. In Proceedings of the 2nd Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2026), pages 27–41, San Diego, USA. Association for Computational Linguistics.
- Cite (Informal):
- Non-Event Oriented Video Assessments in Long-Form Robot Videos (Lukin et al., MAGMaR 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.magmar-main.8.pdf