2025
pdf
bib
abs
PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data
Juntao Tan
|
Liangwei Yang
|
Zuxin Liu
|
Zhiwei Liu
|
Rithesh R N
|
Tulika Manoj Awalgaonkar
|
Jianguo Zhang
|
Weiran Yao
|
Ming Zhu
|
Shirley Kokane
|
Silvio Savarese
|
Huan Wang
|
Caiming Xiong
|
Shelby Heinecke
Findings of the Association for Computational Linguistics: ACL 2025
Personalization is essential for AI assistants, especially in private AI settings where models are expected to interpret users’ personal data (e.g., conversations, app usage) to understand their background, preferences, and social context. However, due to privacy concerns, existing academic research lacks direct access to such data, making benchmarking difficult. To fill this gap, we propose a synthetic data pipeline that generates realistic user profiles and private documents, enabling the creation of PersonaBench—a benchmark for evaluating models’ ability to understand personal information. Using this benchmark, we assess Retrieval-Augmented Generation (RAG) pipelines on personalized questions and find that current models struggle to accurately extract and answer questions even when provided with the full set of user documents, highlighting the need for improved personalization methods.
pdf
bib
abs
xLAM: A Family of Large Action Models to Empower AI Agent Systems
Jianguo Zhang
|
Tian Lan
|
Ming Zhu
|
Zuxin Liu
|
Thai Quoc Hoang
|
Shirley Kokane
|
Weiran Yao
|
Juntao Tan
|
Akshara Prabhakar
|
Haolin Chen
|
Zhiwei Liu
|
Yihao Feng
|
Tulika Manoj Awalgaonkar
|
Rithesh R N
|
Zeyuan Chen
|
Ran Xu
|
Juan Carlos Niebles
|
Shelby Heinecke
|
Huan Wang
|
Silvio Savarese
|
Caiming Xiong
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Autonomous agents powered by large language models (LLMs) have attracted significant research interest. However, the open-source community faces many challenges in developing specialized models for agent tasks, driven by the scarcity of high-quality agent datasets and the absence of standard protocols in this area. We introduce xLAM, a series of large action models designed for AI agent tasks. The xLAM series includes five models with both dense and mixture-of-expert architectures, ranging from 1B to 8x22B parameters, trained using a scalable, flexible pipeline that unifies, augments, and synthesizes diverse datasets to enhance AI agents’ generalizability and performance across varied environments. Our experimental results demonstrate that xLAM consistently delivers exceptional performance across multiple agent ability benchmarks, notably securing the 1st position on the Berkeley Function-Calling Leaderboard, outperforming GPT-4, Claude-3, and many other models in terms of tool use. By releasing the xLAM series, we aim to advance the performance of open-source LLMs for autonomous AI agents, potentially accelerating progress and democratizing access to high-performance models for agent tasks.
2024
pdf
bib
abs
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent
Zhiwei Liu
|
Weiran Yao
|
Jianguo Zhang
|
Zuxin Liu
|
Liangwei Yang
|
Rithesh R N
|
Tian Lan
|
Ming Zhu
|
Juntao Tan
|
Shirley Kokane
|
Thai Quoc Hoang
|
Juan Carlos Niebles
|
Shelby Heinecke
|
Huan Wang
|
Silvio Savarese
|
Caiming Xiong
Proceedings of the 28th Conference on Computational Natural Language Learning
We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle Optimization (RPO). After execution, RPO employs a reflector to critique current action principles and an optimizer to update them accordingly.We investigate the RPO framework under two scenarios: Reward-RPO, which uses environmental rewards for reflection, and Self-RPO, which conducts self-reflection without external rewards. Additionally, we developed two RPO methods, RPO-Traj and RPO-Batch, to adapt to different settings.Experimental results across four environments demonstrate that the PRAct agent, leveraging the RPO framework, can effectively learn and apply action principles to enhance performance.