Yimeng Zhang
2026
Trajectory2Task: Training Robust Tool-Calling Agents with Synthesized Yet Verifiable Data for Complex User Intents
Ziyi Wang | Yuxuan Lu | Yimeng Zhang | Pei Chen | Ziwei Dong | Jing Huang | Jiri Gesi | Xianfeng Tang | Chen Luo | Qun Liu | Yisi Sang | Hanqing Lu | Manling Li | Jin Lai | Dakuo Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ziyi Wang | Yuxuan Lu | Yimeng Zhang | Pei Chen | Ziwei Dong | Jing Huang | Jiri Gesi | Xianfeng Tang | Chen Luo | Qun Liu | Yisi Sang | Hanqing Lu | Manling Li | Jin Lai | Dakuo Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tool-calling agents are increasingly deployed in real-world customer-facing workflows. Yet most studies on tool-calling agents focus on idealized settings with general, fixed, and well-specified tasks.In real-world applications, user requests are often (1) ambiguous, (2) changing over time, or (3) infeasible due to policy constraints, and training and evaluation data that cover these diverse, complex interaction patterns remain under-represented.To bridge the gap, we present Trajectory2Task a verifiable data generation pipeline for studying tool use at scale under three realistic user scenarios: ambiguous intent, changing intent, and infeasible intents.The pipeline first conducts multi-turn exploration to produce valid tool-call trajectories. It then converts these trajectories into user-facing tasks with controlled intent adaptations. This process yields verifiable task that support closed-loop evaluation and training. We benchmark several state-of-the-art LLMs on the generated complex user scenario tasks and observe frequent failures.Finally, using successful trajectories obtained from task rollouts, we fine-tune lightweight LLMs and find consistent improvements across all three conditions, along with better generalization to unseen tool-use domains, indicating stronger tool-calling ability.
2025
Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits
Bohan Li | Jiannan Guan | Longxu Dou | Yunlong Feng | Dingzirui Wang | Yang Xu | Enbo Wang | Qiguang Chen | Bichen Wang | Xiao Xu | Yimeng Zhang | Libo Qin | Yanyan Zhao | Qingfu Zhu | Wanxiang Che
Proceedings of the 31st International Conference on Computational Linguistics
Bohan Li | Jiannan Guan | Longxu Dou | Yunlong Feng | Dingzirui Wang | Yang Xu | Enbo Wang | Qiguang Chen | Bichen Wang | Xiao Xu | Yimeng Zhang | Libo Qin | Yanyan Zhao | Qingfu Zhu | Wanxiang Che
Proceedings of the 31st International Conference on Computational Linguistics
The Myers-Briggs Type Indicator (MBTI) is one of the most influential personality theories reflecting individual differences in thinking, feeling, and behaving. MBTI personality detection has garnered considerable research interest and has evolved significantly over the years. However, this task tends to be overly optimistic, as it currently does not align well with the natural distribution of population personality traits. Specifically, the self-reported labels in existing datasets result in data quality issues and the hard labels fail to capture the full range of population personality distributions. In this paper, we identify the task by constructing MBTIBench, the first manually annotated MBTI personality detection dataset with soft labels, under the guidance of psychologists. Our experimental results confirm that soft labels can provide more benefits to other psychological tasks than hard labels. We highlight the polarized predictions and biases in LLMs as key directions for future research.
2024
SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning
Jinghan Jia | Yihua Zhang | Yimeng Zhang | Jiancheng Liu | Bharat Runwal | James Diffenderfer | Bhavya Kailkhura | Sijia Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Jinghan Jia | Yihua Zhang | Yimeng Zhang | Jiancheng Liu | Bharat Runwal | James Diffenderfer | Bhavya Kailkhura | Sijia Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) have highlighted the necessity of effective unlearning mechanisms to comply with data regulations and ethical AI practices. LLM unlearning aims at removing undesired data influences and associated model capabilities without compromising utility beyond the scope of unlearning. While interest in studying LLM unlearning is growing, the impact of the optimizer choice for LLM unlearning remains unexplored. In this work, we shed light on the significance of optimizer selection in LLM unlearning for the first time, establishing a clear connection between second-order optimization and influence unlearning (a classical approach using influence functions to update the model for data influence removal). This insight propels us to develop a second-order optimization-based LLM unlearning framework, termed Second-Order UnLearning (SOUL), which extends the static, one-shot model update using influence unlearning to a dynamic, iterative unlearning process. Our extensive experiments show that SOUL consistently outperforms conventional first-order methods across various unlearning tasks, models, and metrics, indicating that second-order optimization offers an effective and broadly applicable solution for LLM unlearning.
Search
Fix author
Co-authors
- Wanxiang Che (车万翔) 1
- Pei Chen 1
- Qiguang Chen (陈麒光) 1
- James Diffenderfer 1
- Ziwei Dong 1
- Longxu Dou 1
- Yunlong Feng 1
- Jiri Gesi 1
- Jiannan Guan 1
- Jing Huang 1
- Jinghan Jia 1
- Bhavya Kailkhura 1
- Jin Lai 1
- Bohan Li 1
- Manling Li 1
- Jiancheng Liu 1
- Qun Liu 1
- Sijia Liu 1
- Hanqing Lu 1
- Yuxuan Lu 1
- Chen Luo 1
- Libo Qin 1
- Bharat Runwal 1
- Yisi Sang 1
- Xianfeng Tang 1
- Bichen Wang 1
- Dakuo Wang 1
- Dingzirui Wang 1
- Enbo Wang 1
- Ziyi Wang 1
- Xiao Xu 1
- Yang Xu 1
- Yihua Zhang 1
- Yanyan Zhao 1
- Qingfu Zhu 1