@inproceedings{ai-etal-2026-tooldna,
title = "{T}ool{DNA}: Autonomous Evolution of Tool Metadata for Robust Dialogue Agents",
author = "Ai, Qiuyuan and
Wang, Cong and
Zhang, Jiaqi and
Han, Zengxin and
Song, Jie",
editor = "Liakata, Maria and
Moreira, Viviane P. and
Zhang, Jiajun and
Jurgens, David",
booktitle = "Findings of the {A}ssociation for {C}omputational {L}inguistics: {ACL} 2026",
month = jul,
year = "2026",
address = "San Diego, California, United States",
publisher = "Association for Computational Linguistics",
url = "https://preview.aclanthology.org/ingest-acl/2026.findings-acl.931/",
pages = "18660--18678",
ISBN = "979-8-89176-395-1",
abstract = "Task-oriented dialogue (TOD) systems are vital for facilitating complex, goal-directed interactions across sectors like customer support and online retail. However, they face persistent limitations: labor-intensive manual metadata tuning and sparse reinforcement learning (RL) rewards that fail to diagnose invocation errors. To address this, we propose ToolDNA, a dynamic adaptation framework enabling autonomous co-evolution of policy networks and tool metadata via RL, anchored by two synergistic loops. An RL loop optimizes policies by generating rollout trajectories (reasoning, actions, descriptive updates) from user inputs, with multi-dimensional rewards refining invocations. A tool metadata loop{---}coordinated by a dedicated Tool Manager{---}evolves metadata through policy-generated candidates during rollouts and Feedback LLM-derived refinements from historical data. These mutually reinforcing loops close traditional reward gaps, forming a closed-loop trial-error-reflection cycle for self-improvement. Extensive experiments on a real-world dataset of 3,100 customer service dialogues confirm ToolDNA{'}s superiority, with notable gains over baselines: it achieves +11{\%} problem resolution and +54{\%} accuracy over commercial LLMs with prompt engineering; +25{\%}/+35{\%} over supervised fine-tuning; and +15{\%}/+15{\%} over traditional RL baseline. Linguistic analysis corroborates evolved metadata retain semantic intent while enhancing parseability. Case studies in two typical contexts, i.e., car inventory search and loan calculation, further validates its ability to resolve critical ambiguities. ToolDNA pioneers scalable self-improvement for robust, deployable tool-augmented agents with minimal human oversight. We release our code to facilitate future research."
}Markdown (Informal)
[ToolDNA: Autonomous Evolution of Tool Metadata for Robust Dialogue Agents](https://preview.aclanthology.org/ingest-acl/2026.findings-acl.931/) (Ai et al., Findings 2026)
ACL