Yuanhao Xi


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Rethinking Stateful Tool Use in Multi-Turn Dialogues: Benchmarks and Challenges
Hongru Wang | Wenyu Huang | Yufei Wang | Yuanhao Xi | Jianqiao Lu | Huan Zhang | Nan Hu | Zeming Liu | Jeff Z. Pan | Kam-Fai Wong
Findings of the Association for Computational Linguistics: ACL 2025

Existing benchmarks that assess Language Models (LMs) as Language Agents (LAs) for tool use primarily focus on stateless, single-turn interactions or partial evaluations, such as tool selection in a single turn, overlooking the inherent stateful nature of interactions in multi-turn applications. To fulfill this gap, we propose DialogTool, a multi-turn dialogue dataset with stateful tool interactions considering the whole life cycle of tool use, across six key tasks in three stages: 1) tool creation; 2) tool utilization: tool awareness, tool selection, tool execution; and 3) role-consistent response: response generation and role play. Furthermore, we build VirtualMobile – an embodied virtual mobile evaluation environment to simulate API calls and assess the robustness of the created APIs. Taking advantage of these artifacts, we conduct comprehensive evaluation on 13 distinct open- and closed-source LLMs and provide detailed analysis at each stage, revealing that the existing state-of-the-art LLMs still cannot perform well to use tools over long horizons .