DateLogicQA: Benchmarking Temporal Biases in Large Language Models
Gagan Bhatia, MingZe Tang, Cristina Mahanta, Madiha Kazi, Maxime Peyrard, Wei Zhao
Abstract
We introduce DateLogicQA, a human-curated benchmark of 190 questions specifically designed to understand temporal bias in Large Language Models (LLMs). Covering seven date formats across past, present, and future contexts, DateLogicQA examines four reasoning types: commonsense, factual, conceptual, and numerical. Through human-led evaluations of 12 state-of-the-art LLMs, we identify Representation-Level Bias, arising from suboptimal embeddings that distort date semantics, and Logical-Level Bias, manifesting when correct date tokens yield flawed temporal reasoning. Our findings underscore persistent challenges in handling various date formats and temporal contexts, revealing the need for more robust pretraining data, targeted post-training methods, and precise tokenization strategies. By illuminating these biases, we provide actionable insights to guide the development of LLMs for accurate temporal reasoning across diverse real-world applications.- Anthology ID:
- 2025.naacl-srw.32
- Original:
- 2025.naacl-srw.32v1
- Version 2:
- 2025.naacl-srw.32v2
- Volume:
- Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
- Month:
- April
- Year:
- 2025
- Address:
- Albuquerque, USA
- Editors:
- Abteen Ebrahimi, Samar Haider, Emmy Liu, Sammar Haider, Maria Leonor Pacheco, Shira Wein
- Venues:
- NAACL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 321–332
- Language:
- URL:
- https://preview.aclanthology.org/better-preview-link/2025.naacl-srw.32/
- DOI:
- 10.18653/v1/2025.naacl-srw.32
- Cite (ACL):
- Gagan Bhatia, MingZe Tang, Cristina Mahanta, Madiha Kazi, Maxime Peyrard, and Wei Zhao. 2025. DateLogicQA: Benchmarking Temporal Biases in Large Language Models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 321–332, Albuquerque, USA. Association for Computational Linguistics.
- Cite (Informal):
- DateLogicQA: Benchmarking Temporal Biases in Large Language Models (Bhatia et al., NAACL 2025)
- PDF:
- https://preview.aclanthology.org/better-preview-link/2025.naacl-srw.32.pdf