TCP: a Benchmark for Temporal Constraint-Based Planning

Zifeng Ding, Sikuan Yan, Moy Yuan, Xianglong Hu, Fangru Lin, Andreas Vlachos


Abstract
Temporal reasoning and planning are essential capabilities for large language models (LLMs), yet most existing benchmarks evaluate them in isolation and under limited forms of complexity. To address this gap, we introduce the Temporal Constraint-based Planning (TCP) benchmark, that jointly assesses both capabilities. Each instance in TCP features a naturalistic dialogue around a collaborative project, where diverse and interdependent temporal constraints are explicitly or implicitly expressed, and models must infer an optimal schedule that satisfies all constraints. To construct TCP, we generate abstract problem prototypes that are then paired with realistic scenarios from various domains and enriched into dialogues using an LLM. A human quality check is performed on a sampled subset to confirm the reliability of our benchmark. We evaluate state-of-the-art LLMs and find that even the strongest models may struggle with TCP, highlighting its difficulty and revealing limitations in LLMs’ temporal constraint-based planning abilities. We analyze underlying failure cases, open source our benchmark, and hope our findings can inspire future research.
Anthology ID:
2025.emnlp-main.1142
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22463–22486
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1142/
DOI:
Bibkey:
Cite (ACL):
Zifeng Ding, Sikuan Yan, Moy Yuan, Xianglong Hu, Fangru Lin, and Andreas Vlachos. 2025. TCP: a Benchmark for Temporal Constraint-Based Planning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 22463–22486, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
TCP: a Benchmark for Temporal Constraint-Based Planning (Ding et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1142.pdf
Checklist:
 2025.emnlp-main.1142.checklist.pdf