NL Schedule: Evaluate Multitask Scheduling Capability of Large Language Models

Wenrui Liao, Weihong Du, Yi Li, Hongru Liang, Wenqiang Lei


Abstract
Automated schedule generation for multitask from natural language descriptions has huge potential in modern industry. While classic methods bypass language complexities by using pre-formatted matrices, and recent LLM+solver approaches introduce new fragilities by relying on solver-specific code generation. This raises critical questions: Can large language models (LLMs) solve this NL Schedule task end-to-end well(RQ1)? If the answer is "no", where do they fall short(RQ2)? And how can their capabilities be enhanced (RQ3)? To answer these questions, we introduce NL Schedule, the first benchmark for this task, equipped with a dataset of 240 description-schedule pairs constructed from real-world materials and a rigorous evaluation suite. Our evaluation of nine state-of-the-art LLMs reveals the limitations of different LLMs in procedure grounding and the strengths of advanced LLMs in global planning via local analysis. To address these shortcomings, we propose Mans, a novel multi-agent framework. Extensive experiments show that Mans achieves more robust performance comparable to six state-of-the-art LLM+solver methods. We hope NL Schedule and Mans will serve as a solid foundation for automatic scheduling.
Anthology ID:
2026.acl-long.1648
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35620–35640
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1648/
DOI:
Bibkey:
Cite (ACL):
Wenrui Liao, Weihong Du, Yi Li, Hongru Liang, and Wenqiang Lei. 2026. NL ⇒ Schedule: Evaluate Multitask Scheduling Capability of Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 35620–35640, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
NL ⇒ Schedule: Evaluate Multitask Scheduling Capability of Large Language Models (Liao et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1648.pdf
Checklist:
 2026.acl-long.1648.checklist.pdf