Xiaoteng Ma
2026
Self-Sum: Teaching an Agent to Decide Itself When and What to Summarize
Hongru Wang | Rui Wang | Jushi Kai | Boyang Xue | Yongqi Li | Shijue Huang | Xiaoteng Ma | Jeff Z. Pan | Amos Storkey
Findings of the Association for Computational Linguistics: ACL 2026
Hongru Wang | Rui Wang | Jushi Kai | Boyang Xue | Yongqi Li | Shijue Huang | Xiaoteng Ma | Jeff Z. Pan | Amos Storkey
Findings of the Association for Computational Linguistics: ACL 2026
Long-horizon agents operate over extended sequences of reasoning and actions, but this inevitably accumulates context noise, resulting in excessive computational cost and information overload. Existing approaches commonly rely on fixed, rule-based summarization strategies (e.g., summarizing every few steps), which are inflexible, lack generalization, and often introduce irreversible information loss. We propose Self-Sum, a framework that empowers agents to autonomously decide when and what to summarize by modeling summarization as a first-class internal cognitive action, unified with external environmental actions within a multi-turn decision-making process. Specifically, we introduce a two-stage training recipe consisting of (i) a cold-start supervised fine-tuning stage that bootstraps summarization behavior, and (ii) a lightweight, summarization-aware reinforcement learning stage that refines summarization timing and content while discouraging unnecessary summaries. Experiments on multiple long-horizon benchmarks show that Self-Sum consistently outperforms no-summarization and rule-based baselines, with particularly strong gains in generalization. Analysis further reveals that Self-Sum learns to summarize sparsely at meaningful moments and preserves task-relevant information, highlighting the importance of jointly learning when and what to summarize for robust long-horizon agent behavior.
From Word to World: Can Large Language Models be Implicit Text-based World Models?
Yixia Li | Hongru Wang | Jiahao Qiu | Zhenfei Yin | Dongdong Zhang | Cheng Qian | Zeping Li | Xiaoteng Ma | Guanhua Chen | Heng Ji
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yixia Li | Hongru Wang | Jiahao Qiu | Zhenfei Yin | Dongdong Zhang | Cheng Qian | Zeping Li | Xiaoteng Ma | Guanhua Chen | Heng Ji
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Agentic learning increasingly hinges on interaction, yet real-world experience is expensive, limited, and often irreversible at inference time. World models promise to mitigate these limitations, but it remains unclear whether large language models can actually serve as reliable world models, and deliver concrete benefits to downstream agents. We investigate these questions in text-based environments, a controlled testbed that reframes language modeling as next-state prediction under interaction. We propose a three-level framework to evaluate LLM-based world models: (i) fidelity and consistency, (ii) scalability and robustness, and (iii) agent utility. Across five representative environments, we show that sufficiently trained world models capture coherent environment dynamics, scale predictably with data and model capacity, and unlock tangible agent improvements—for example, action verification boosts GPT-4o by 5.5% on WebShop, and warm-started RL achieves a 15% gain on SciWorld. Crucially, these benefits hinge on behavioral coverage and environment complexity, sharply characterizing when world modeling meaningfully advances agent learning.