Sikuan Yan
2026
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Sikuan Yan | Xiufeng Yang | Zuchao Huang | Ercong Nie | Zifeng Ding | Zonggen Li | Xiaowen Ma | Jinhe Bi | Kristian Kersting | Jeff Z. Pan | Hinrich Schuetze | Volker Tresp | Yunpu Ma
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sikuan Yan | Xiufeng Yang | Zuchao Huang | Ercong Nie | Zifeng Ding | Zonggen Li | Xiaowen Ma | Jinhe Bi | Kristian Kersting | Jeff Z. Pan | Hinrich Schuetze | Volker Tresp | Yunpu Ma
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B–14B).
2025
TCP: a Benchmark for Temporal Constraint-Based Planning
Zifeng Ding | Sikuan Yan | Moy Yuan | Xianglong Hu | Fangru Lin | Andreas Vlachos
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Zifeng Ding | Sikuan Yan | Moy Yuan | Xianglong Hu | Fangru Lin | Andreas Vlachos
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Temporal reasoning and planning are essential capabilities for large language models (LLMs), yet most existing benchmarks evaluate them in isolation and under limited forms of complexity. To address this gap, we introduce the Temporal Constraint-based Planning (TCP) benchmark, that jointly assesses both capabilities. Each instance in TCP features a naturalistic dialogue around a collaborative project, where diverse and interdependent temporal constraints are explicitly or implicitly expressed, and models must infer an optimal schedule that satisfies all constraints. To construct TCP, we generate abstract problem prototypes that are then paired with realistic scenarios from various domains and enriched into dialogues using an LLM. A human quality check is performed on a sampled subset to confirm the reliability of our benchmark. We evaluate state-of-the-art LLMs and find that even the strongest models may struggle with TCP, highlighting its difficulty and revealing limitations in LLMs’ temporal constraint-based planning abilities. We analyze underlying failure cases, open source our benchmark, and hope our findings can inspire future research.