Fan Mo
Other people with similar names: Fan Mo
Unverified author pages with similar names: Fan Mo
2026
Lightweight LLM Agent Memory with Small Language Models
Jiaquan Zhang | Chaoning Zhang | Shuxu Chen | Zhenzhen Huang | Pengcheng Zheng | Zhicheng Wang | Ping Guo | Fan Mo | Sung-Ho Bae | Jie Zou | Jiwei Wei | Yang Yang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiaquan Zhang | Chaoning Zhang | Shuxu Chen | Zhenzhen Huang | Pengcheng Zheng | Zhicheng Wang | Ping Guo | Fan Mo | Sung-Ho Bae | Jie Zou | Jiwei Wei | Yang Yang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low online overhead but suffer from unstable accuracy due to limited query construction and candidate filtering. In contrast, many systems use repeated large-model calls for online memory operations, improving accuracy but accumulating latency over long interactions. We propose LightMem, a lightweight memory system for better agent memory driven by Small Language Models (SLMs). LightMem modularizes memory retrieval, writing, and long-term consolidation, and separates online processing from offline consolidation to enable efficient memory invocation under bounded compute. We organize memory into short-term memory (STM) for immediate conversational context, mid-term memory (MTM) for reusable interaction summaries, and long-term memory (LTM) for consolidated knowledge, and uses user identifiers to support independent retrieval and incremental maintenance in multi-user settings. Online, LightMem operates under a fixed retrieval budget and selects memories via a two-stage procedure: vector-based coarse retrieval followed by semantic consistency re-ranking. Offline, it abstracts reusable interaction evidence and incrementally integrates it into LTM. Experiments show consistent gains across model scales, with an average F1 improvement of about 2.5 over A-MEM on LoCoMo, while achieving higher efficiency and low median latency (83 ms for retrieval and 581 ms end-to-end).