Yueyang Ding
2026
L2Dir: Integrating L_2-Norm and Directional Alignment for Unsupervised Contrastive Representation Learning in Multimodal Retrieval
Tianyu Zong | Rui Dai | Hongzhu Yi | Yuanxiang Wang | Zhenghao Zhang | Zhenyu Guan | Yujia Yang | Bingkang Shi | Yueyang Ding | Xiangxiang Chu | Kaikui Liu | Jungang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tianyu Zong | Rui Dai | Hongzhu Yi | Yuanxiang Wang | Zhenghao Zhang | Zhenyu Guan | Yujia Yang | Bingkang Shi | Yueyang Ding | Xiangxiang Chu | Kaikui Liu | Jungang Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Multimodal representation learning primarily relies on contrastive objectives such as InfoNCE to align diverse modalities. However, these methods focus almost exclusively on directional alignment and often neglect the intrinsic role of embedding magnitudes (L2-norm) in the contrastive process. To bridge this gap, we propose L2Dir, a plug-and-play framework designed to optimize L2-norm alignment and Directional consistency jointly. As a highly efficient solution, L2Dir doesn’t require extra data, distillation, or external supervision. It can be integrated seamlessly into existing pipelines by employing a lightweight MLP to reconstruct magnitudes from frozen backbone features. Extensive evaluations across 95 tasks using UniIR and VLM2Vec-V2 frameworks demonstrate that L2Dir yields consistent and significant performance gains over established baselines across various backbones and scales, proving that explicit magnitude modeling is a versatile and potent strategy for refining unsupervised multimodal representations. The source code for L2Dir in VLM2Vec-V2 is available in the supplementary materials.
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
Yueyang Ding | HaoPeng Zhang | Rui Dai | Yi Wang | Tianyu Zong | Kaikui Liu | Xiangxiang Chu
Findings of the Association for Computational Linguistics: ACL 2026
Yueyang Ding | HaoPeng Zhang | Rui Dai | Yi Wang | Tianyu Zong | Kaikui Liu | Xiangxiang Chu
Findings of the Association for Computational Linguistics: ACL 2026
Comprehensive understanding of time series remains a significant challenge for Large Language Models (LLMs). Current research is hindered by fragmented task definitions and benchmarks with inherent ambiguities, precluding rigorous evaluation and the development of unified Time Series Reasoning Models (TSRMs). To bridge this gap, we formalize Time Series Reasoning (TSR) via a four-level taxonomy of increasing cognitive complexity. We introduce HiTSR, a **hi**erarchical **t**ime **s**eries **r**easoning dataset comprising 83k samples with diverse task combinations and verified Chain-of-Thought (CoT) trajectories. Leveraging HiTSR, we propose LLaTiSA, a strong TSRM that integrates visualized patterns with precision-calibrated numerical tables to enhance the temporal perception of Vision-Language Models (VLMs). Through a multi-stage curriculum fine-tuning strategy, LLaTiSA achieves superior performance and exhibits robust out-of-distribution generalization across diverse TSR tasks and real-world scenarios. We will publicly release the code, dataset, and model weights.