Jian Jiang
2026
HeteroCache: A Dynamic Retrieval Approach to Heterogeneous KV Cache Compression for Long-Context LLM Inference
Zhiyuan Shi | Qibo Qiu | Xuefeng | Zhonglin Jiang | Li Yu | Jian Jiang | Xiaofei He | Wenxiao Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiyuan Shi | Qibo Qiu | Xuefeng | Zhonglin Jiang | Li Yu | Jian Jiang | Xiaofei He | Wenxiao Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The linear memory growth of the KV cache poses a significant bottleneck for LLM inference in long-context tasks. Existing static compression methods often fail to preserve globally important information. Although recent dynamic retrieval approaches attempt to address this issue, they typically suffer from coarse-grained caching strategies and incur high I/O overhead. To overcome these limitations, we propose HeteroCache, a training-free dynamic compression framework. Our method is built on two key insights: attention heads exhibit diverse temporal heterogeneity, and there is significant spatial redundancy among heads within the same layer.Guided by these insights, HeteroCache categorizes heads based on stability and similarity, applying a fine-grained weighting strategy that allocates larger cache budgets to heads with rapidly shifting attention to capture context changes.Furthermore, it features a hierarchical storage mechanism where representative heads monitor attention drift to trigger asynchronous, on-demand context retrieval, thereby hiding I/O latency.Experiments demonstrate that HeteroCache achieves state-of-the-art performance on long-context benchmarks and accelerates decoding by up to 3× compared to the original model with a 224K context. Our code is available at https://github.com/ponytaill/HeteroCache.
2017
EICA Team at SemEval-2017 Task 3: Semantic and Metadata-based Features for Community Question Answering
Yufei Xie | Maoquan Wang | Jing Ma | Jian Jiang | Zhao Lu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
Yufei Xie | Maoquan Wang | Jing Ma | Jian Jiang | Zhao Lu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
We describe our system for participating in SemEval-2017 Task 3 on Community Question Answering. Our approach relies on combining a rich set of various types of features: semantic and metadata. The most important group turned out to be the metadata feature and the semantic vectors trained on QatarLiving data. In the main Subtask C, our primary submission was ranked fourth, with a MAP of 13.48 and accuracy of 97.08. In Subtask A, our primary submission get into the top 50%.