Li Yu
2026
HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
Zhiyuan Shi | Qibo Qiu | Xuefeng | Zhonglin Jiang | Li Yu | Jian Jiang | Xiaofei He | Wenxiao Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiyuan Shi | Qibo Qiu | Xuefeng | Zhonglin Jiang | Li Yu | Jian Jiang | Xiaofei He | Wenxiao Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The linear memory growth of the KV cache poses a significant bottleneck for LLM inference in long-context tasks. Existing static compression methods often fail to preserve globally important information. Although recent dynamic retrieval approaches attempt to address this issue, they typically suffer from coarse-grained caching strategies and incur high I/O overhead. To overcome these limitations, we propose HeteroCache, a training-free dynamic compression framework. Our method is built on two key insights: attention heads exhibit diverse temporal heterogeneity, and there is significant spatial redundancy among heads within the same layer.Guided by these insights, HeteroCache categorizes heads based on stability and similarity, applying a fine-grained weighting strategy that allocates larger cache budgets to heads with rapidly shifting attention to capture context changes.Furthermore, it features a hierarchical storage mechanism where representative heads monitor attention drift to trigger asynchronous, on-demand context retrieval, thereby hiding I/O latency.Experiments demonstrate that HeteroCache achieves state-of-the-art performance on long-context benchmarks and accelerates decoding by up to 3× compared to the original model with a 224K context. Our code is available at https://github.com/ponytaill/HeteroCache.
Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution
Zouying Cao | Jiaji Deng | Li Yu | Weikang Zhou | Zhaoyang Liu | Bolin Ding | Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Zouying Cao | Jiaji Deng | Li Yu | Weikang Zhou | Zhaoyang Liu | Bolin Ding | Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2026
Procedural memory enables large language model (LLM) agents to internalize ”how-to” knowledge and thus reduce redundant trial-and-error. However, existing frameworks predominantly suffer from a ”passive accumulation” paradigm, treating memory as a static append-only archive. To bridge the gap between static storage and dynamic reasoning, we propose ReMe (Remember Me, Refine Me), a comprehensive framework for experience-driven agent evolution. ReMe manages the memory lifecycle via three mechanisms: 1) multi-faceted distillation, which extracts fine-grained experiences by recognizing success patterns, analyzing failure triggers and generating comparative insights; 2) context-adaptive reuse, which tailors historical insights to new contexts through scenario-aware indexing; and 3) utility-based refinement, which automatically adds validated memories and prunes outdated ones to maintain a compact, high-quality experience pool. Experiments on BFCL-V3 and AppWorld demonstrate that ReMe establishes a new state-of-the-art in agent memory system. Crucially, we observe a significant memory-scaling effect: Qwen3-8B equipped with ReMe outperforms larger, memoryless Qwen3-14B, indicating that self-evolving memory provides a computation-efficient path for lifelong learning.
2025
RUC Team at SemEval-2025 Task 5: Fast Automated Subject Indexing: A Method Based on Similar Records Matching and Related Subject Ranking
Xia Tian | Yang Xin | Wu Jing | Xiu Heng | Zhang Xin | Li Yu | Gao Tong | Tan Xi | Hu Dong | Chen Tao | Jia Zhi
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Xia Tian | Yang Xin | Wu Jing | Xiu Heng | Zhang Xin | Li Yu | Gao Tong | Tan Xi | Hu Dong | Chen Tao | Jia Zhi
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
This paper presents MaRSI, an automatic subject indexing method designed to address the limitations of traditional manual indexing and emerging GenAI technologies. Focusing on improving indexing accuracy in cross-lingual contexts and balancing efficiency and accuracy in large-scale datasets, MaRSI mimics human reference learning behavior by constructing semantic indexes from pre-indexed document. It calculates similarity to retrieve relevant references, merges, and reorders their topics to generate index results. Experiments demonstrate that MaRSI outperforms supervised fine-tuning of LLMs on the same dataset, offering advantages in speed, effectiveness, and interpretability.