Huaiwen Zhang


2026

Recently, human motion understanding has been a prominent area of research due to its critical importance in many fields. The key to advancing this understanding lies in the precise alignment between motion and linguistic modalities. Existing methods mainly follow two paradigms: global contrastive alignment and vocabulary space-based alignment. However, motion sequences exhibit sequential spatiotemporal dynamics while text conveys abstract semantics, leading to a fundamental mismatch in semantic levels and granularities. This undermines cross-modal alignment and results in suboptimal downstream performance. To alleviate this, we introduce a modality-shared codebook that enables unified representation learning and precise alignment of motion and linguistic modalities. Each codeword in the codebook is regularized to encode cross-modality shared semantics, and we leverage sparse activation and distribution consistency loss to enforce matched motion and text are represented by the same set of codewords. Additionally, we introduce a locality-aware Gaussian encoder to refine pose features and design a hard-negative guided loss to strengthen alignment discriminability. Extensive experiments across various language-motion evaluation, including text-motion retrieval, text-motion grounding, and motion caption, demonstrate that our model significantly surpasses current state-of-the-art methods.
The proliferation of short video fake news threatens social stability. Current detection methods rely either on black-box Multimodal Small Language Models (MSLMs), which suffer from poor explainability and superficial understanding, or on specific prompt strategies for Multimodal Large Language Models (MLLMs) that underutilize their reasoning capabilities and knowledge. To address these challenges, we propose a novel multi-agent framework named CSI for short video fake news detection. CSI implements two key units: 1) Multimodal Forensics Unit (MFU), which performs synchronous multimodal deconstruction and external knowledge retrieval to collect comprehensive evidence. 2) Case Review Unit (CRU), which first employs collaborative discussion to facilitate viewpoint interaction to obtain the review result. Subsequently, the Adjudicator integrates evidence and the review result via multiple attention mechanisms to interact with the news, ensuring a robust verdict.Extensive experiments on two real-world datasets demonstrate that CSI provides rigorous explanations while achieving state-of-the-art performance. Our code is available at: https://github.com/VFCenter/CSI.

2025

Rumor detection on social media has become crucial due to the rapid spread of misinformation. Existing approaches primarily focus on within-domain tasks, resulting in suboptimal performance in cross-domain scenarios due to domain shift. To address this limitation, we draw inspiration from the strong generalization capabilities of Test-Time Adaptation (TTA) and propose a novel framework to enhance rumor detection performance across different domains. Specifically, we introduce Test-Time Adaptation for Rumor Detection (T2ARD), which incorporates both single-domain model and target graph adaptation strategies tailored to the unique requirements of cross-domain rumor detection. T2ARD utilizes a graph adaptation module that updates the graph structure and node attributes through multi-level self-supervised contrastive learning, aiming to derive invariant graph representations. To mitigate the impact of significant distribution shifts on self-supervised signals, T2ARD performs model adaptation by using annotations from Large Language Models (LLMs) on target graph to produce pseudo-labels as supervised signals. Experiments conducted on four widely used cross-domain datasets demonstrate that T2ARD achieves state-of-the-art performance, surpassing existing methods in rumor detection.
Large Language Models (LLMs) can assist multimodal fake news detection by predicting pseudo labels. However, LLM-generated pseudo labels alone demonstrate poor performance compared to traditional detection methods, making their effective integration non-trivial. In this paper, we propose Global Label Propagation Network with LLM-based Pseudo Labeling (GLPN-LLM) for multimodal fake news detection, which integrates LLM capabilities via label propagation techniques. The global label propagation can utilize LLM-generated pseudo labels, enhancing prediction accuracy by propagating label information among all samples. For label propagation, a mask-based mechanism is designed to prevent label leakage during training by ensuring that training nodes do not propagate their own labels back to themselves. Experimental results on benchmark datasets show that by synergizing LLMs with label propagation, our model achieves superior performance over state-of-the-art baselines.