EgoNormia: Benchmarking Physical-Social Norm Understanding

Mohammadhossein Rezaei; Yicheng Fu; Phil Cuvin; Caleb Ziems; Yanzhe Zhang; Hao Zhu; Diyi Yang

EgoNormia: Benchmarking Physical-Social Norm Understanding

MohammadHossein Rezaei, Yicheng Fu, Phil Cuvin, Caleb Ziems, Yanzhe Zhang, Hao Zhu, Diyi Yang

Abstract

Human activity is moderated by norms; however, supervision for normative reasoning is sparse, particularly where norms are physically- or socially-grounded. We thus present EgoNormia \lVert 𝜖 \rVert, comprising 1,853 (200 for EgoNormia-verified) multiple choice questions (MCQs) grounded within ego-centric videos of human interactions, enabling the evaluation and improvement of normative reasoning in vision-language models (VLMs). spans seven norm categories: safety, privacy, proxemics, politeness, cooperation, coordination/proactivity, and communication/legibility. To compile this dataset at scale, we propose a novel pipeline to generate grounded MCQs from raw egocentric video. Our work demonstrates that current state-of-the-art VLMs lack robust grounded norm understanding, scoring a maximum of 54% on EgoNormia and 58% on EgoNormia-verified, with performance across norm categories indicating significant risks of safety and privacy when VLMs are used in real-world agents. We additionally explore methods for improving normative understanding, demonstrating a naive retrieval-based generation (RAG) method using can enhance normative reasoning in VLMs.

Anthology ID:: 2025.findings-acl.985
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19256–19283
Language:
URL:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.985/
DOI:
Bibkey:
Cite (ACL):: MohammadHossein Rezaei, Yicheng Fu, Phil Cuvin, Caleb Ziems, Yanzhe Zhang, Hao Zhu, and Diyi Yang. 2025. EgoNormia: Benchmarking Physical-Social Norm Understanding. In Findings of the Association for Computational Linguistics: ACL 2025, pages 19256–19283, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: EgoNormia: Benchmarking Physical-Social Norm Understanding (Rezaei et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/display_plenaries/2025.findings-acl.985.pdf

PDF Cite Search Fix data