Zihao Tang
2026
Too Long, Do Re-weighting for Efficient LLM Reasoning Compression
Zhong-Zhi Li | Xiao Liang | Zihao Tang | Lei Ji | Peijie Wang | Haotian Xu | Xing W | Haizhen Huang | Weiwei Deng | Yeyun Gong | Zhijiang Guo | Xiao Liu | Fei Yin | Cheng-Lin Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhong-Zhi Li | Xiao Liang | Zihao Tang | Lei Ji | Peijie Wang | Haotian Xu | Xing W | Haizhen Huang | Weiwei Deng | Yeyun Gong | Zhijiang Guo | Xiao Liu | Fei Yin | Cheng-Lin Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have recently achieved remarkable progress on complex reasoning tasks by leveraging extended Chain-of-Thought (CoT) techniques. These reasoning processes can be roughly categorized into System-1 (fast and intuitive) and System-2 (slow and deliberate) paradigms. However, excessive reliance on lengthy System-2-style reasoning during inference can produce extremely long outputs, thereby reducing efficiency. In this work, we propose Thinking Length Data Re-weighting (TLDR), that does not rely on sophisticated data annotations or interpolation between multiple models. We continuously balance the weights between the model’s System-1 and System-2 data to eliminate redundant reasoning processes while preserving the model’s reasoning capability. We validate our method across multiple base models, including Deepseek-R1-Distilled Qwen models, as well as on a diverse benchmarks with varying difficulty levels. Our method significantly reduces the number of output tokens by nearly 40% while maintaining the accuracy of the reasoning.
Mnemis: Dual-Route Retrieval on Hierarchical Graphs for Long-Term LLM Memory
Zihao Tang | Xin Yu | Ziyu Xiao | Zengxuan Wen | Zelin Li | Jiaxi Zhou | Hualei Wang | Haohua Wang | Haizhen Huang | Weiwei Deng | Feng Sun | Qi Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zihao Tang | Xin Yu | Ziyu Xiao | Zengxuan Wen | Zelin Li | Jiaxi Zhou | Hualei Wang | Haohua Wang | Haizhen Huang | Weiwei Deng | Feng Sun | Qi Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
AI Memory, specifically how models organizes and retrieves historical messages, becomes increasingly valuable to Large Language Models (LLMs), yet existing methods (RAG and Graph-RAG) primarily retrieve memory through similarity-based mechanisms. While efficient, such System-1-style retrieval struggles with scenarios that require global reasoning or comprehensive coverage of all relevant information. In this work, We propose Mnemis, a novel memory framework that integrates System-1 similarity search with a complementary System-2 mechanism, termed Global Selection. Mnemis organizes memory into a base graph for similarity retrieval and a hierarchical graph that enables top-down, deliberate traversal over semantic hierarchies. By combining the complementary strength from both retrieval routes, Mnemis retrieves memory items that are both semantically and structurally relevant. Mnemis achieves state-of-the-art performance across all compared methods on long-term memory benchmarks, scoring 93.9 on LoCoMo and 91.6 on LongMemEval-S using GPT-4.1-mini.