ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution

Alexandru Coca; Mark Gaynor; Zhenxing Zhang; Jianpeng Cheng; Bo-Hsiang Tseng; Peter Boothroyd; Héctor Martínez Alonso; Diarmuid Ó Séaghdha; Anders Johannsen

ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution

Alexandru Coca, Mark Gaynor, Zhenxing Zhang, Jianpeng Cheng, Bo-Hsiang Tseng, Peter Boothroyd, Hector Martinez Alonso, Diarmuid O Seaghdha, Anders Johannsen

Abstract

This work evaluates the potential of large language models (LLMs) to power digital assistants capable of complex action execution. Such assistants rely on pre-trained programming knowledge to execute multi-step goals by composing objects and functions defined in assistant libraries into action execution programs. To achieve this, we develop ASPERA, a framework comprising an assistant library simulation and a human-assisted LLM data generation engine. Our engine allows developers to guide LLM generation of high-quality tasks consisting of complex user queries, simulation state and corresponding validation programs, tackling data availability and evaluation robustness challenges. Alongside the framework we release Asper-Bench, an evaluation dataset of 250 challenging tasks generated using ASPERA, which we use to show that program generation grounded in custom assistant libraries is a significant challenge to LLMs compared to dependency-free code generation.

Anthology ID:: 2025.acl-long.1234
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25399–25434
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1234/
DOI:
Bibkey:
Cite (ACL):: Alexandru Coca, Mark Gaynor, Zhenxing Zhang, Jianpeng Cheng, Bo-Hsiang Tseng, Peter Boothroyd, Hector Martinez Alonso, Diarmuid O Seaghdha, and Anders Johannsen. 2025. ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 25399–25434, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: ASPERA: A Simulated Environment to Evaluate Planning for Complex Action Execution (Coca et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1234.pdf

PDF Cite Search Fix data