Duygu Sezen Islakoglu
2025
ChronoSense: Exploring Temporal Understanding in Large Language Models with Time Intervals of Events
Duygu Sezen Islakoglu
|
Jan-Christoph Kalo
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Large Language Models (LLMs) still face significant challenges in reasoning and arithmetic. Although temporal reasoning has raised increasing research attention, comprehensive testing of Allen’s interval relations (e.g., before, after, during) —a fundamental framework for temporal relationships— remains underexplored. To fill this gap, we present ChronoSense, a new benchmark for evaluating LLMs’ temporal understanding. It includes 16 tasks, identifying the Allen relation between two temporal events and temporal arithmetic. We assess the performance of seven recent LLMs. The results indicate that models handle Allen relations, even symmetrical ones, quite differently. Moreover, the findings suggest that the models may rely on memorization to answer time-related questions. Overall, the models’ low performance highlights the need for improved temporal understanding in LLMs. Our dataset and the source code are available at https://github.com/duyguislakoglu/chronosense.