Guhong Chen
2026
Beyond Quantity: Trajectory Diversity Scaling for Code Agents
Guhong Chen | Chenghao Sun | Cheng Fu | Qiyao Wang | Zhihong Huang | ChaoPeng Wei | Guangxu Chen | Feiteng Fang | Ahmadreza Argha | Bing Zhao | Xander Xu | Qi Han | Hamid Alinejad-Rokny | Qiang Qu | Binhua Li | Shiwen Ni | Min Yang | HU Wei | Yongbin Li
Findings of the Association for Computational Linguistics: ACL 2026
Guhong Chen | Chenghao Sun | Cheng Fu | Qiyao Wang | Zhihong Huang | ChaoPeng Wei | Guangxu Chen | Feiteng Fang | Ahmadreza Argha | Bing Zhao | Xander Xu | Qi Han | Hamid Alinejad-Rokny | Qiang Qu | Binhua Li | Shiwen Ni | Min Yang | HU Wei | Yongbin Li
Findings of the Association for Computational Linguistics: ACL 2026
As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and the diminishing returns of quantity scaling; moreover, quantity-centric scaling exhibits an early bottleneck that underutilizes trajectory data. We propose TDScaling, a Trajectory Diversity Scaling-based data synthesis framework for code agents that scales performance through diversity rather than raw volume. Moreover, TDScaling is more data-efficient: under a fixed training budget, increasing trajectory diversity yields larger gains than adding more trajectories, improving the performance-cost trade-off for agent training. TDScaling integrates four innovations: (1) a Business Cluster mechanism that captures real-service logical dependencies; (2) a Blueprint-driven multi-agent paradigm that enforces trajectory coherence; (3) an adaptive evolution mechanism that steers synthesis toward long-tail scenarios using Domain Entropy, Reasoning Mode Entropy, and Cumulative Action Complexity to prevent mode collapse; and (4) a sandboxed code tool that mitigates catastrophic forgetting of intrinsic coding capabilities. Experiments on general tool-use benchmarks (BFCL, 𝜏2-Bench) and code agent tasks (RebenchT, CodeCI, BIRD) demonstrate a win-win outcome: TDScaling improves both tool-use generalization and inherent coding proficiency. Crucially, we show that trajectory diversity scaling attains a substantially higher performance ceiling than quantity scaling, establishing a resource-efficient paradigm for training robust code agents under data bottlenecks.
Towards IP Intelligence: Benchmarking Large Language Models on Intellectual Property Knowledge and Practice
Qiyao Wang | Guhong Chen | Hongbo Wang | Huaren Liu | Minghui Zhu | Zhifei Qin | Li Linwei | Yilin Yue | Shiqiang Wang | Jiayan Li | Wu Yihang | Ziqiang Liu | Longze Chen | Run Luo | Liyang Fan | Jiaming Li | Lei Zhang | Kan Xu | Hamid Alinejad-Rokny | Chengming Li | Shiwen Ni | Yuan Lin | Min Yang
Findings of the Association for Computational Linguistics: ACL 2026
Qiyao Wang | Guhong Chen | Hongbo Wang | Huaren Liu | Minghui Zhu | Zhifei Qin | Li Linwei | Yilin Yue | Shiqiang Wang | Jiayan Li | Wu Yihang | Ziqiang Liu | Longze Chen | Run Luo | Liyang Fan | Jiaming Li | Lei Zhang | Kan Xu | Hamid Alinejad-Rokny | Chengming Li | Shiwen Ni | Yuan Lin | Min Yang
Findings of the Association for Computational Linguistics: ACL 2026
Intellectual Property (IP) is a highly specialized domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. Recent advancements in LLMs have demonstrated their potential to handle IP tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce **IPBench**, the first comprehensive IP task taxonomy and a large-scale bilingual benchmark encompassing **8 IP mechanisms and 20 distinct tasks**, designed to evaluate LLMs in real-world IP practice. We benchmark **19 main LLMs**, ranging from general purpose to domain-specific, including chat-oriented and reasoning-focused models, under zero-shot, few-shot, and chain-of-thought settings. Our results show that even the top-performing model, DeepSeek-V3, achieves only 75.8% accuracy, indicating significant room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. To foster future research, we publicly release IPBench, and will expand it with additional tasks to better reflect real-world complexities and support model advancements in the IP domain. We provide the data, code in the supplementary materials.
2025
AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents
Guhong Chen | Liyang Fan | Zihan Gong | Nan Xie | Zixuan Li | Ziqiang Liu | Chengming Li | Qiang Qu | Hamid Alinejad-Rokny | Shiwen Ni | Min Yang
Findings of the Association for Computational Linguistics: ACL 2025
Guhong Chen | Liyang Fan | Zihan Gong | Nan Xie | Zixuan Li | Ziqiang Liu | Chengming Li | Qiang Qu | Hamid Alinejad-Rokny | Shiwen Ni | Min Yang
Findings of the Association for Computational Linguistics: ACL 2025
Current research in LLM-based simulation systems lacks comprehensive solutions for modeling real-world court proceedings, while existing legal language models struggle with dynamic courtroom interactions. We present **AgentCourt**, a comprehensive legal simulation framework that addresses these challenges through adversarial evolution of LLM-based agents. Our AgentCourt introduces a new adversarial evolutionary approach for agents called **AdvEvol**, which performs dynamic knowledge learning and evolution through structured adversarial interactions in a simulated courtroom program, breaking the limitations of the traditional reliance on static knowledge bases or manual annotations. By simulating 1,000 civil cases, we construct an evolving knowledge base that enhances the agents’ legal reasoning abilities. The evolved lawyer agents demonstrated outstanding performance on our newly introduced **CourtBench** benchmark, achieving a 12.1% improvement in performance compared to the original lawyer agents. Evaluations by professional lawyers confirm the effectiveness of our approach across three critical dimensions: cognitive agility, professional knowledge, and logical rigor. Beyond outperforming specialized legal models in interactive reasoning tasks, our findings emphasize the importance of adversarial learning in legal AI and suggest promising directions for extending simulation-based legal reasoning to broader judicial and regulatory contexts.
Search
Fix author
Co-authors
- Min Yang 4
- Hamid Alinejad-Rokny 3
- Shiwen Ni 3
- Liyang Fan 2
- Zihan Gong 2
- Chengming Li 2
- Ziqiang Liu 2
- Qiang Qu 2
- Qiyao Wang 2
- Ahmadreza Argha 1
- Guangxu Chen 1
- Longze Chen 1
- Feiteng Fang 1
- Cheng Fu 1
- Qi Han 1
- Zhihong Huang 1
- Binhua Li 1
- Yongbin Li 1
- Zixuan Li 1
- Jiayan Li 1
- Jiaming Li 1
- Yuan Lin 1
- Li Linwei 1
- Huaren Liu 1
- Run Luo 1
- Zhifei Qin 1
- Chenghao Sun 1
- Minghuan Tan 1
- Hongbo Wang 1
- Shiqiang Wang 1
- ChaoPeng Wei 1
- HU Wei 1
- Nan Xie 1
- Xander Xu 1
- Kan Xu 1
- Wu Yihang 1
- Tianshu Yu 1
- Yilin Yue 1
- Lei Zhang 1
- Bing Zhao 1
- Minghui Zhu 1