Synergizing Multimodal Temporal Knowledge Graphs and Large Language Models for Social Relation Recognition

Haorui Wang, Zheng Wang, Yuxuan Zhang, Bo Wang, Bin Wu


Abstract
Recent years have witnessed remarkable advances in Large Language Models (LLMs). However, in the task of social relation recognition, Large Language Models (LLMs) encounter significant challenges due to their reliance on sequential training data, which inherently restricts their capacity to effectively model complex graph-structured relationships. To address this limitation, we propose a novel low-coupling method synergizing multimodal temporal Knowledge Graphs and Large Language Models (mtKG-LLM) for social relation reasoning. Specifically, we extract multimodal information from the videos and model the social networks as spatial Knowledge Graphs (KGs) for each scene. Temporal KGs are constructed based on spatial KGs and updated along the timeline for long-term reasoning. Subsequently, we retrieve multi-scale information from the graph-structured knowledge for LLMs to recognize the underlying social relation. Extensive experiments demonstrate that our method has achieved state-of-the-art performance in social relation recognition. Furthermore, our framework exhibits effectiveness in bridging the gap between KGs and LLMs. Our code will be released after acceptance.
Anthology ID:
2025.emnlp-main.224
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4501–4520
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.224/
DOI:
Bibkey:
Cite (ACL):
Haorui Wang, Zheng Wang, Yuxuan Zhang, Bo Wang, and Bin Wu. 2025. Synergizing Multimodal Temporal Knowledge Graphs and Large Language Models for Social Relation Recognition. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4501–4520, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Synergizing Multimodal Temporal Knowledge Graphs and Large Language Models for Social Relation Recognition (Wang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.224.pdf
Checklist:
 2025.emnlp-main.224.checklist.pdf