Temporal Token Matters: Investigating and Interpreting the Consistency of Temporal Ordering in Large Language Models

Zhen Yang, Xinyue Zhang, Ping Jian, Chengzhi Li, Zhongbin Guo, Jiaping Feng, Wenpeng Lu


Abstract
Despite the remarkable performance across numerous tasks, Large Language Models (LLMs) still exhibit notable deficiencies in temporal reasoning, even in simple event ordering tasks. For instance, a slight alteration in the temporal phrasing of the question (e.g., changing "Is event A before B?” to "Is event A after B?") can lead LLMs to hallucinate and produce inconsistent answers, reflecting a lack of robust temporal reasoning. Although many prior studies have focused on benchmarking and improving the temporal reasoning ability of LLMs, little is known about the intrinsic mechanisms within LLMs when performing temporal reasoning. In this work, we investigate the mechanistic interpretability of temporal ordering within event temporal reasoning through a structured "Identify-Interpret-Verify” pipeline. We first employ path patching to identify a sparse subset of attention heads that are causally responsible for reasoning outcomes. Detailed pattern analysis reveals that these key heads specialize in attending to either temporal keywords (semantic cues) or structural delimiters (syntactic cues). Furthermore, we rigorously validate the observed mechanism through comprehensive intervention-based experiments, ranging from head ablation to targeted attention modulation. We demonstrate that dynamically modulating the attention of these specific heads can robustly enhance model performance, which serves as strong empirical evidence that our identified mechanism faithfully captures the internal logic of temporal ordering in LLMs.
Anthology ID:
2026.findings-acl.123
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2577–2596
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.123/
DOI:
Bibkey:
Cite (ACL):
Zhen Yang, Xinyue Zhang, Ping Jian, Chengzhi Li, Zhongbin Guo, Jiaping Feng, and Wenpeng Lu. 2026. Temporal Token Matters: Investigating and Interpreting the Consistency of Temporal Ordering in Large Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 2577–2596, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Temporal Token Matters: Investigating and Interpreting the Consistency of Temporal Ordering in Large Language Models (Yang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.123.pdf
Checklist:
 2026.findings-acl.123.checklist.pdf