Zhenyu Guan

2026

Multimodal representation learning primarily relies on contrastive objectives such as InfoNCE to align diverse modalities. However, these methods focus almost exclusively on directional alignment and often neglect the intrinsic role of embedding magnitudes (L2-norm) in the contrastive process. To bridge this gap, we propose L2Dir, a plug-and-play framework designed to optimize L2-norm alignment and Directional consistency jointly. As a highly efficient solution, L2Dir doesn’t require extra data, distillation, or external supervision. It can be integrated seamlessly into existing pipelines by employing a lightweight MLP to reconstruct magnitudes from frozen backbone features. Extensive evaluations across 95 tasks using UniIR and VLM2Vec-V2 frameworks demonstrate that L2Dir yields consistent and significant performance gains over established baselines across various backbones and scales, proving that explicit magnitude modeling is a versatile and potent strategy for refining unsupervised multimodal representations. The source code for L2Dir in VLM2Vec-V2 is available in the supplementary materials.

pdf bib abs

Current LLM role-playing systems model persona as a monolithic, static attribute, conflating identity consistency with emotional rigidity. This leads to either robotic repetition or catastrophic persona drift under sustained interaction. We introduce Dynamic Persona Coherence, a framework that decouples Identity-Layer Stability (time-invariant traits) from Adaptive-Layer Appropriateness (history-dependent psychological evolution). We operationalize this through the L/M/S Psychological State Model, which represents persona dynamics across long-term identity, mid-term meaning/stress accumulation, and short-term affect. On top of this state representation, a closed-loop alignment system comprising an automated evaluator (Persona Consistency Critic, PCC), a selective repository (Persona Case Repository, PCR), and a trajectory-adjusting corrector (Persona Drift Suppressor, PDS) enables autonomous coherence repair. Experiments on GPT-4o, Claude-3.5-Sonnet, and DeepSeek-V3.2 demonstrate consistent improvements (+16–84% PCC gains).