Haoxin Liu
2025
A Picture is Worth A Thousand Numbers: Enabling LLMs Reason about Time Series via Visualization
Haoxin Liu
|
Chenghao Liu
|
B. Aditya Prakash
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Large language models (LLMs), with demonstrated reasoning abilities across multiple domains, have been largely underexplored fortime-series reasoning (TsR), which is ubiquitous in the real world. In this work, wepropose TimerBed, the first comprehensivetestbed for evaluating LLMs’ TsR performance.Specifically, TimerBed includes stratified reasoning patterns with real-world tasks, diversecombinations of LLMs and reasoning strategies, and various supervised models as comparison anchors. We perform extensive experiments with TimerBed, test multiple current beliefs, and observe the initial failuresof LLMs in TsR, as evidenced by the ineffectiveness of zero shot (ZST) and performancedegradation of few shot in-context learning(ICL). Further, we identify one possible rootcause: the numerical modeling of data. Toaddress this, we propose a prompt-based solution VL-Time, with visualization-modeled dataand language-guided reasoning. Experimental results demonstrate that VL-Time enablesmultimodal LLMs to be non-trivial ZST andpowerful ICL reasoners for time series, achieving about 140% average performance improvement and 99% average token costs reduction.TimerBed and VL-Time are available at https://github.com/AdityaLab/DeepTime/.
2024
LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting
Haoxin Liu
|
Zhiyuan Zhao
|
Jindong Wang
|
Harshavardhan Kamarthi
|
B. Aditya Prakash
Findings of the Association for Computational Linguistics: ACL 2024
Time-series forecasting (TSF) finds broad applications in real-world scenarios. Prompting off-the-shelf Large Language Models (LLMs) demonstrates strong zero-shot TSF capabilities while preserving computational efficiency. However, existing prompting methods oversimplify TSF as language next-token predictions, overlooking its dynamic nature and lack of integration with state-of-the-art prompt strategies such as Chain-of-Thought. Thus, we propose LSTPrompt, a novel approach for prompting LLMs in zero-shot TSF tasks. LSTPrompt decomposes TSF into short-term and long-term forecasting sub-tasks, tailoring prompts to each. LSTPrompt guides LLMs to regularly reassess forecasting mechanisms to enhance adaptability. Extensive evaluations demonstrate consistently better performance of LSTPrompt than existing prompting methods, and competitive results compared to foundation TSF models.