Phil Cuvin
2025
EgoNormia: Benchmarking Physical-Social Norm Understanding
MohammadHossein Rezaei
|
Yicheng Fu
|
Phil Cuvin
|
Caleb Ziems
|
Yanzhe Zhang
|
Hao Zhu
|
Diyi Yang
Findings of the Association for Computational Linguistics: ACL 2025
Human activity is moderated by norms; however, supervision for normative reasoning is sparse, particularly where norms are physically- or socially-grounded. We thus present EgoNormia \lVert 𝜖 \rVert, comprising 1,853 (200 for EgoNormia-verified) multiple choice questions (MCQs) grounded within ego-centric videos of human interactions, enabling the evaluation and improvement of normative reasoning in vision-language models (VLMs). spans seven norm categories: safety, privacy, proxemics, politeness, cooperation, coordination/proactivity, and communication/legibility. To compile this dataset at scale, we propose a novel pipeline to generate grounded MCQs from raw egocentric video. Our work demonstrates that current state-of-the-art VLMs lack robust grounded norm understanding, scoring a maximum of 54% on EgoNormia and 58% on EgoNormia-verified, with performance across norm categories indicating significant risks of safety and privacy when VLMs are used in real-world agents. We additionally explore methods for improving normative understanding, demonstrating a naive retrieval-based generation (RAG) method using can enhance normative reasoning in VLMs.
Search
Fix author
Co-authors
- Yicheng Fu 1
- Mohammadhossein Rezaei 1
- Diyi Yang 1
- Yanzhe Zhang 1
- Hao Zhu 1
- show all...