ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution

Alexandru Coca, Mark Gaynor, Zhenxing Zhang, Jianpeng Cheng, Bo-Hsiang Tseng, Peter Boothroyd, Hector Martinez Alonso, Diarmuid O Seaghdha, Anders Johannsen


Abstract
This work evaluates the potential of large language models (LLMs) to power digital assistants capable of complex action execution. Such assistants rely on pre-trained programming knowledge to execute multi-step goals by composing objects and functions defined in assistant libraries into action execution programs. To achieve this, we develop ASPERA, a framework comprising an assistant library simulation and a human-assisted LLM data generation engine. Our engine allows developers to guide LLM generation of high-quality tasks consisting of complex user queries, simulation state and corresponding validation programs, tackling data availability and evaluation robustness challenges. Alongside the framework we release Asper-Bench, an evaluation dataset of 250 challenging tasks generated using ASPERA, which we use to show that program generation grounded in custom assistant libraries is a significant challenge to LLMs compared to dependency-free code generation.
Anthology ID:
2025.acl-long.1234
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25399–25434
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1234/
DOI:
Bibkey:
Cite (ACL):
Alexandru Coca, Mark Gaynor, Zhenxing Zhang, Jianpeng Cheng, Bo-Hsiang Tseng, Peter Boothroyd, Hector Martinez Alonso, Diarmuid O Seaghdha, and Anders Johannsen. 2025. ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25399–25434, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution (Coca et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1234.pdf