Interesting Culture: Social Relation Recognition from Videos via Culture De-confounding

Yuxuan Zhang, Yangfu Zhu, Haorui Wang, Bin Wu


Abstract
Social relationship recognition, as one of the fundamental tasks in video understanding, contributes to the construction and application of multi-modal knowledge graph. Previous works have mainly focused on two aspects: generating character graphs and multi-modal fusion. However, they often overlook the impact of cultural differences on relationship recognition. Specifically, relationship recognition models are susceptible to being misled by training data from a specific cultural context. This can result in the learning of culture-specific spurious correlations, ultimately restricting the ability to recognize social relationships in different cultures. Therefore, we employ a customized causal graph to analyze the confounding effects of culture in the relationship recognition task. We propose a Cultural Causal Intervention (CCI) model that mitigates the influence of culture as a confounding factor in the visual and textual modalities. Importantly, we also construct a novel video social relation recognition (CVSR) dataset to facilitate discussion and research on cultural factors in video tasks. Extensive experiments conducted on several datasets demonstrate that the proposed model surpasses state-of-the-art methods.
Anthology ID:
2025.findings-emnlp.277
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5174–5184
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.277/
DOI:
10.18653/v1/2025.findings-emnlp.277
Bibkey:
Cite (ACL):
Yuxuan Zhang, Yangfu Zhu, Haorui Wang, and Bin Wu. 2025. Interesting Culture: Social Relation Recognition from Videos via Culture De-confounding. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 5174–5184, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Interesting Culture: Social Relation Recognition from Videos via Culture De-confounding (Zhang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.277.pdf
Checklist:
 2025.findings-emnlp.277.checklist.pdf