Runnan Fang
2026
U-Fold: Dynamic Intent-Aware Context Folding for User-Centric Agents
Jin Su | Runnan Fang | Yeqiu Li | Xiaobin Wang | Shihao Cai | Pengjun Xie | Ningyu Zhang | Fajie Yuan
Findings of the Association for Computational Linguistics: ACL 2026
Jin Su | Runnan Fang | Yeqiu Li | Xiaobin Wang | Shihao Cai | Pengjun Xie | Ningyu Zhang | Fajie Yuan
Findings of the Association for Computational Linguistics: ACL 2026
Large language model (LLM)-based agents have been successfully deployed in many tool-augmented settings, but their scalability is fundamentally constrained by context length. Existing context-folding methods mitigate this issue by summarizing past interactions, yet they are typically designed for single-query or single-intent scenarios. In more realistic user-centric dialogues, we identify two major failure modes: (i) they irreversibly discard fine-grained constraints and intermediate facts that are crucial for later decisions, and (ii) their summaries fail to track evolving user intent, leading to omissions and erroneous actions. To address these limitations, we propose U-Fold, a dynamic context-folding framework tailored to user-centric tasks. U-Fold retains the full user–agent dialogue and tool-call history but, at each turn, uses two core components to produce an intent-aware, evolving dialogue summary and a compact, task-relevant tool log. Extensive experiments on 𝜏-bench, 𝜏2-bench, VitaBench, and harder context-inflated settings show that U-Fold consistently outperforms ReAct (achieving a 71.4% win rate in long-context settings) and prior folding baselines (with improvements of up to 27.0%), particularly on long, noisy, multi-turn tasks. Our study demonstrates that U-Fold is a promising step toward transferring context-management techniques from single-query benchmarks to realistic user-centric applications.
Towards General Agentic Intelligence via Environment Scaling
Runnan Fang | Shihao Cai | Baixuan Li | Jialong Wu | Guangyu Li | Wenbiao Yin | Xinyu Wang | Xiaobin Wang | Liangcai Su | Zhen Zhang | Shibin Wu | Zhengwei Tao | Yong Jiang | Pengjun Xie | Ningyu Zhang | Fei Huang | Wentao Zhang | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Runnan Fang | Shihao Cai | Baixuan Li | Jialong Wu | Guangyu Li | Wenbiao Yin | Xinyu Wang | Xiaobin Wang | Liangcai Su | Zhen Zhang | Shibin Wu | Zhengwei Tao | Yong Jiang | Pengjun Xie | Ningyu Zhang | Fei Huang | Wentao Zhang | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Advanced agentic intelligence is a prerequisite for deploying Large Language Models in practical, real-world applications. Diverse real-world APIs demand precise, robust function-calling intelligence, which needs agents to develop these capabilities through interaction in varied environments. The breadth of function-calling competence is closely tied to the diversity of environments in which agents are trained. In this work, we scale up environments as a step towards advancing general agentic intelligence. This gives rise to two central challenges: (i) how to scale environments in a principled manner, and (ii) how to effectively train agentic capabilities from experiences derived through interactions with these environments. To address these, we design a scalable framework that automatically constructs heterogeneous environments that are fully simulated, broadening the space of function-calling scenarios. We further adapt a two-phase agent fine-tuning strategy: first endowing agents with fundamental agentic capabilities, then specializing them for domain-specific contexts. Extensive experiments on agentic benchmarks, -bench, -Bench, and ACEBench, demonstrate that our trained model, AgentScaler, significantly enhances the models’ function-calling capability.
Memp: Exploring Agent Procedural Memory
Runnan Fang | Yuan Liang | Xiaobin Wang | Jialong Wu | Shuofei Qiao | Pengjun Xie | Fei Huang | Huajun Chen | Ningyu Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Runnan Fang | Yuan Liang | Xiaobin Wang | Jialong Wu | Shuofei Qiao | Pengjun Xie | Fei Huang | Huajun Chen | Ningyu Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters. In this work, we investigate strategies to endow agents with a learnable, updatable, and lifelong procedural memory. We propose a procedural-memory repository that distills past agent trajectories into both fine-grained, step-by-step instructions and higher-level, script-like abstractions. Coupled with a dynamic regimen that continuously updates, corrects, and deprecates its contents, this repository evolves in lockstep with new experience. Empirical evaluation on TravelPlanner and Alfworld shows that as the memory repository is refined, agents achieve steadily higher success rates and greater efficiency on analogous tasks. Moreover, procedural memory built from a stronger model retains its value: migrating the procedural memory to a weaker model yields substantial performance gains.
2025
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Zekun Xi | Wenbiao Yin | Jizhan Fang | Jialong Wu | Runnan Fang | Yong Jiang | Pengjun Xie | Fei Huang | Huajun Chen | Ningyu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Zekun Xi | Wenbiao Yin | Jizhan Fang | Jialong Wu | Runnan Fang | Yong Jiang | Pengjun Xie | Fei Huang | Huajun Chen | Ningyu Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Machine writing with large language models often relies on retrieval-augmented generation. However, these approaches remain confined within the boundaries of the model’s predefined scope, limiting the generation of content with rich information. Specifically, vanilla-retrieved information tends to lack depth, novelty, and suffers from redundancy, which negatively impacts the quality of generated articles, leading to shallow, unoriginal, and repetitive outputs. To address these issues, we propose OmniThink, a slow-thinking machine writing framework that emulates the human-like process of iterative expansion and reflection. The core idea behind OmniThink is to simulate the cognitive behavior of learners as they slowly deepen their knowledge of the topics. Experimental results demonstrate that OmniThink improves the knowledge density of generated articles without compromising metrics such as coherence and depth. Human evaluations and expert feedback further highlight the potential of OmniThink to address real-world challenges in the generation of long-form articles.
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
Runnan Fang | Xiaobin Wang | Yuan Liang | Shuofei Qiao | Jialong Wu | Zekun Xi | Ningyu Zhang | Yong Jiang | Pengjun Xie | Fei Huang | Huajun Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Runnan Fang | Xiaobin Wang | Yuan Liang | Shuofei Qiao | Jialong Wu | Zekun Xi | Ningyu Zhang | Yong Jiang | Pengjun Xie | Fei Huang | Huajun Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate unconventional action spaces. To empower agents to autonomously explore environments, optimize workflows, and enhance their understanding of actions, we propose SynWorld, a framework that allows agents to synthesize possible scenarios with multi-step action invocation within the action space and perform Monte Carlo Tree Search (MCTS) exploration to effectively refine their action knowledge in the current environment. Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments.
WebWalker: Benchmarking LLMs in Web Traversal
Jialong Wu | Wenbiao Yin | Yong Jiang | Zhenglin Wang | Zekun Xi | Runnan Fang | Linhai Zhang | Yulan He | Deyu Zhou | Pengjun Xie | Fei Huang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jialong Wu | Wenbiao Yin | Yong Jiang | Zhenglin Wang | Zekun Xi | Runnan Fang | Linhai Zhang | Yulan He | Deyu Zhou | Pengjun Xie | Fei Huang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Retrieval-augmented generation (RAG) demonstrates remarkable performance across tasks in open-domain question-answering. However, traditional search engines may retrieve shallow content, limiting the ability of LLMs to handle complex, multi-layered information. To address this, we introduce WebWalkerQA, a benchmark designed to assess the ability of LLMs to perform web traversal. It evaluates the capacity of LLMs to traverse a website’s subpages to extract high-quality data systematically. We propose WebWalker, which is a multi-agent framework that mimics human-like web navigation through an explore-critic paradigm. Extensive experimental results show that WebWalkerQA is challenging and demonstrates the effectiveness of RAG combined with WebWalker, through this horizontal and vertical integration in real-world scenarios.
2024
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning
Shuofei Qiao | Ningyu Zhang | Runnan Fang | Yujie Luo | Wangchunshu Zhou | Yuchen Jiang | Chengfei Lv | Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shuofei Qiao | Ningyu Zhang | Runnan Fang | Yujie Luo | Wangchunshu Zhou | Yuchen Jiang | Chengfei Lv | Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language agents have achieved considerable performance on various complex question-answering tasks by planning with external tools. Despite the incessant exploration in this field, existing language agent systems still struggle with costly, non-reproducible data reliance and face the challenge of compelling a single model for multiple functions. To this end, we introduce AutoAct, an automatic agent learning framework for QA that does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models (e.g., GPT-4). Given limited data with a tool library, AutoAct first automatically synthesizes planning trajectories without any assistance from humans or strong closed-source models. Then, AutoAct leverages a division-of-labor strategy to automatically differentiate based on the target task information and synthesized trajectories, producing a sub-agent group to complete the task. We conduct comprehensive experiments with different LLMs, which demonstrates that AutoAct yields better or parallel performance compared to various strong baselines. Further analysis demonstrates the effectiveness of the division-of-labor strategy, with the trajectory quality generated by AutoAct generally outperforming that of others.
EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models
Yixin Ou | Ningyu Zhang | Honghao Gui | Ziwen Xu | Shuofei Qiao | Runnan Fang | Lei Li | Zhen Bi | Guozhou Zheng | Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Yixin Ou | Ningyu Zhang | Honghao Gui | Ziwen Xu | Shuofei Qiao | Runnan Fang | Lei Li | Zhen Bi | Guozhou Zheng | Huajun Chen
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist among various instruction processing methods, there is no standard open-source instruction processing implementation framework available for the community, which hinders practitioners from further developing and advancing. To facilitate instruction processing research and development, we present EasyInstruct, an easy-to-use instruction processing framework for LLMs, which modularizes instruction generation, selection, and prompting, while also considering their combination and interaction. EasyInstruct is publicly released and actively maintained at Github, along with an online demo app and a demo video for quick-start, calling for broader research centered on instruction data and synthetic data.
Search
Fix author
Co-authors
- Ningyu Zhang 7
- Pengjun Xie 6
- Huajun Chen 5
- Jialong Wu 5
- Yong Jiang 4
- Shuofei Qiao 4
- Xiaobin Wang 4
- Fei Huang 3
- Zekun Xi 3
- Wenbiao Yin 3
- Shihao Cai 2
- Fei Huang 2
- Yuan Liang 2
- Zhen Bi 1
- Jizhan Fang 1
- Honghao Gui 1
- Yulan He 1
- Yuchen Jiang 1
- Baixuan Li 1
- Guangyu Li 1
- Lei Li 1
- Yeqiu Li 1
- Yujie Luo 1
- Chengfei Lv 1
- Yixin Ou 1
- Jin Su 1
- Liangcai Su 1
- Zhengwei Tao 1
- Xinyu Wang 1
- Zhenglin Wang 1
- Shibin Wu 1
- Ziwen Xu 1
- Fajie Yuan 1
- Linhai Zhang 1
- Wentao Zhang 1
- Zhen Zhang 1
- Guozhou Zheng 1
- Deyu Zhou 1
- Jingren Zhou 1
- Wangchunshu Zhou 1