Temporal Token Matters: Investigating and Interpreting the Consistency of Temporal Ordering in Large Language Models
Zhen Yang, Xinyue Zhang, Ping Jian, Chengzhi Li, Zhongbin Guo, Jiaping Feng, Wenpeng Lu
Abstract
Despite the remarkable performance across numerous tasks, Large Language Models (LLMs) still exhibit notable deficiencies in temporal reasoning, even in simple event ordering tasks. For instance, a slight alteration in the temporal phrasing of the question (e.g., changing "Is event A before B?” to "Is event A after B?") can lead LLMs to hallucinate and produce inconsistent answers, reflecting a lack of robust temporal reasoning. Although many prior studies have focused on benchmarking and improving the temporal reasoning ability of LLMs, little is known about the intrinsic mechanisms within LLMs when performing temporal reasoning. In this work, we investigate the mechanistic interpretability of temporal ordering within event temporal reasoning through a structured "Identify-Interpret-Verify” pipeline. We first employ path patching to identify a sparse subset of attention heads that are causally responsible for reasoning outcomes. Detailed pattern analysis reveals that these key heads specialize in attending to either temporal keywords (semantic cues) or structural delimiters (syntactic cues). Furthermore, we rigorously validate the observed mechanism through comprehensive intervention-based experiments, ranging from head ablation to targeted attention modulation. We demonstrate that dynamically modulating the attention of these specific heads can robustly enhance model performance, which serves as strong empirical evidence that our identified mechanism faithfully captures the internal logic of temporal ordering in LLMs.- Anthology ID:
- 2026.findings-acl.123
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2577–2596
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.123/
- DOI:
- Cite (ACL):
- Zhen Yang, Xinyue Zhang, Ping Jian, Chengzhi Li, Zhongbin Guo, Jiaping Feng, and Wenpeng Lu. 2026. Temporal Token Matters: Investigating and Interpreting the Consistency of Temporal Ordering in Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 2577–2596, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Temporal Token Matters: Investigating and Interpreting the Consistency of Temporal Ordering in Large Language Models (Yang et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.123.pdf