TS-CLIP: Time Series Understanding by CLIP

Ziwen Chen, Xiaoyuan Zhang, Ming Zhu


Abstract
Contrastive Language–Image Pre-training (CLIP) has recently demonstrated remarkable success in aligning vision and language. Aligning time series with text leverages the rich semantic cues of language to enhance interpretability and generalization, addressing a largely underexplored area of research. Although applying the CLIP training paradigm to time-series and language pairs is promising, it may result in label collapse due to the sparse semantic annotations and the absence of visual cues in time-series data. To address this, we introduce Time Series CLIP (TS-CLIP), a novel approach that tackles label collapse using a synonym bank mechanism. Synonym bank exploits word analogy phenomena to generate potential synonym embeddings as alignment targets. Specifically, the synonym bank facilitates aligning time series with a word distribution instead of a precise textual description. We conducted extensive zero-shot and few-shot experiments on 128 sub-datasets from the UCR archive. The results show that TS-CLIP achieves state-of-the-art (SOTA) performance in zero-shot settings on 51 datasets. Comprehensive ablation studies and visualization analyzes reveal that TS-CLIP effectively aligns time series with natural language. To the best of our knowledge, this is the first foundational model to achieve general time series and natural language alignment. TS-CLIP introduces a new paradigm for the semantic understanding of time series and opens the possibility of integrating the time series modality into multimodal large models.
Anthology ID:
2025.emnlp-main.231
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4646–4664
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.231/
DOI:
Bibkey:
Cite (ACL):
Ziwen Chen, Xiaoyuan Zhang, and Ming Zhu. 2025. TS-CLIP: Time Series Understanding by CLIP. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 4646–4664, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
TS-CLIP: Time Series Understanding by CLIP (Chen et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.231.pdf
Checklist:
 2025.emnlp-main.231.checklist.pdf