Evaluating LLMs’ Ability to Understand Numerical Time Series for Text Generation
Mizuki Arai, Tatsuya Ishigaki, Masayuki Kawarada, Yusuke Miyao, Hiroya Takamura, Ichiro Kobayashi
Abstract
Data-to-text generation tasks often involve processing numerical time-series as input such as financial statistics or meteorological data. Although large language models (LLMs) are a powerful approach to data-to-text, we still lack a comprehensive understanding of how well they actually understand time-series data. We therefore introduce a benchmark with 18 evaluation tasks to assess LLMs’ abilities of interpreting numerical time-series, which are categorized into: 1) event detection—identifying maxima and minima; 2) computation—averaging and summation; 3) pairwise comparison—comparing values over time; and 4) inference—imputation and forecasting. Our experiments reveal five key findings: 1) even state-of-the-art LLMs struggle with complex multi-step reasoning; 2) tasks that require extracting values or performing computations within a specified range of the time-series significantly reduce accuracy; 3) instruction tuning offers inconsistent improvements for numerical interpretation; 4) reasoning-based models outperform standard LLMs in complex numerical tasks; and 5) LLMs perform interpolation better than forecasting. These results establish a clear baseline and serve as a wake-up call for anyone aiming to blend fluent language with trustworthy numeric precision in time-series scenarios.- Anthology ID:
- 2025.inlg-main.16
- Volume:
- Proceedings of the 18th International Natural Language Generation Conference
- Month:
- October
- Year:
- 2025
- Address:
- Hanoi, Vietnam
- Editors:
- Lucie Flek, Shashi Narayan, Lê Hồng Phương, Jiahuan Pei
- Venue:
- INLG
- SIG:
- SIGGEN
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 232–248
- Language:
- URL:
- https://preview.aclanthology.org/author-page-you-zhang-rochester/2025.inlg-main.16/
- DOI:
- Cite (ACL):
- Mizuki Arai, Tatsuya Ishigaki, Masayuki Kawarada, Yusuke Miyao, Hiroya Takamura, and Ichiro Kobayashi. 2025. Evaluating LLMs’ Ability to Understand Numerical Time Series for Text Generation. In Proceedings of the 18th International Natural Language Generation Conference, pages 232–248, Hanoi, Vietnam. Association for Computational Linguistics.
- Cite (Informal):
- Evaluating LLMs’ Ability to Understand Numerical Time Series for Text Generation (Arai et al., INLG 2025)
- PDF:
- https://preview.aclanthology.org/author-page-you-zhang-rochester/2025.inlg-main.16.pdf