Wu Liu


Fixing paper assignments

  1. Please select all papers that do not belong to this person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
ACEBench: A Comprehensive Evaluation of LLM Tool Usage
Chen Chen | Xinlong Hao | Weiwen Liu | Xu Huang | Xingshan Zeng | Shuai Yu | Dexun Li | Yuefeng Huang | Xiangcheng Liu | Wang Xinzhi | Wu Liu
Findings of the Association for Computational Linguistics: EMNLP 2025

Large Language Models (LLMs) have demonstrated significant potential in decision-making and reasoning, particularly when integrated with various tools to effectively solve complex problems. However, existing benchmarks for evaluating LLMs’ tool usage face several limitations: (1) limited evaluation scenarios, often lacking assessments in real multi-turn dialogue contexts; (2) narrow evaluation dimensions, with insufficient detailed assessments of how LLMs use tools; and (3) reliance on LLMs or real API executions for evaluation, which introduces significant overhead. To address these challenges, we introduce ACEBench, a comprehensive benchmark for assessing tool usage in LLMs. ACEBench categorizes data into three primary types based on evaluation methodology: Normal, Special, and Agent. “Normal” evaluates tool usage in basic scenarios; “Special” evaluates tool usage in situations with ambiguous or incomplete instructions; “Agent” evaluates tool usage through multi-agent interactions to simulate real-world, multi-turn dialogues. We conducted extensive experiments using ACEBench, analyzing various LLMs in-depth and providing a more granular examination of error causes across different data types.

2006

pdf bib
France Telecom R&D Beijing Word Segmenter for Sighan Bakeoff 2006
Wu Liu | Heng Li | Yuan Dong | Nan He | Haitao Luo | Haila Wang
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

2005

pdf bib
Chinese Word Segmentation in FTRD Beijing
Heng Li | Yuan Dong | Xinnian Mao | Haila Wang | Wu Liu
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing