Fan Wu
Other people with similar names: Fan Wu
Unverified author pages with similar names: Fan Wu
2025
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference
Kunxi Li | Zhonghua Jiang | Zhouzhou Shen | Zhaode Wang | Chengfei Lv | Shengyu Zhang | Fan Wu | Fei Wu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Kunxi Li | Zhonghua Jiang | Zhouzhou Shen | Zhaode Wang | Chengfei Lv | Shengyu Zhang | Fan Wu | Fei Wu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper introduces MadaKV, a modality-adaptive key-value (KV) cache eviction strategy designed to enhance the efficiency of multimodal large language models (MLLMs) in long-context inference. In multimodal scenarios, attention heads exhibit varying preferences for different modalities, resulting in significant disparities in modality importance across attention heads. Traditional KV cache eviction methods, which are tailored for unimodal settings, fail to capture modality-specific information, thereby yielding suboptimal performance. MadaKV addresses these challenges through two key components: modality preference adaptation and hierarchical compression compensation. By dynamically sensing modality information within attention heads and adaptively retaining critical tokens, MadaKV achieves substantial reductions in KV cache memory footprint and model inference decoding latency (1.3 to 1.5 times improvement) while maintaining high accuracy across various multimodal long-context tasks. Extensive experiments on representative MLLMs and the MileBench benchmark demonstrate the effectiveness of MadaKV compared to existing KV cache eviction methods.
Pre3: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation
Junyi Chen | Shihao Bai | Zaijun Wang | Siyu Wu | Chuheng Du | Hailong Yang | Ruihao Gong | Shengzhong Liu | Fan Wu | Guihai Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Junyi Chen | Shihao Bai | Zaijun Wang | Siyu Wu | Chuheng Du | Hailong Yang | Ruihao Gong | Shengzhong Liu | Fan Wu | Guihai Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Extensive LLM applications demand efficient structured generations, particularly for LR(1) grammars, to produce outputs in specified formats (e.g., JSON). Existing methods primarily parse LR(1) grammars into a pushdown automaton (PDA), leading to runtime execution overhead for context-dependent token processing, especially inefficient under large inference batches.To address these issues, we propose Pre3 that exploits deterministic pushdown automata (DPDA) to optimize the constrained LLM decoding efficiency.First, by **pre**computing **pre**fix-conditioned edges during the **pre**processing, Pre3 enables ahead-of-time edge analysis and thus makes parallel transition processing possible.Futher, leveraging the prefix-conditioned edges, Pre3 introduces a novel approach that transforms LR(1) transition graphs into DPDA, eliminating the need for runtime path exploration and achieving edge transitions with minimal overhead.Pre3 can be seamlessly integrated into standard LLM inference frameworks, improving time per output token (TPOT) by up to 40% and throughput by up to 36% in our experiments. Our code is available at https://github.com/ModelTC/lightllm.
OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use
Xueyu Hu | Tao Xiong | Biao Yi | Zishu Wei | Ruixuan Xiao | Yurun Chen | Jiasheng Ye | Meiling Tao | Xiangxin Zhou | Ziyu Zhao | Yuhuai Li | Shengze Xu | Shenzhi Wang | Xinchen Xu | Shuofei Qiao | Zhaokai Wang | Kun Kuang | Tieyong Zeng | Liang Wang | Jiwei Li | Yuchen Eleanor Jiang | Wangchunshu Zhou | Guoyin Wang | Keting Yin | Zhou Zhao | Hongxia Yang | Fan Wu | Shengyu Zhang | Fei Wu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xueyu Hu | Tao Xiong | Biao Yi | Zishu Wei | Ruixuan Xiao | Yurun Chen | Jiasheng Ye | Meiling Tao | Xiangxin Zhou | Ziyu Zhao | Yuhuai Li | Shengze Xu | Shenzhi Wang | Xinchen Xu | Shuofei Qiao | Zhaokai Wang | Kun Kuang | Tieyong Zeng | Liang Wang | Jiwei Li | Yuchen Eleanor Jiang | Wangchunshu Zhou | Guoyin Wang | Keting Yin | Zhou Zhao | Hongxia Yang | Fan Wu | Shengyu Zhang | Fei Wu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations. With the evolution of multi-modal large language models ((M)LLMs), this dream is closer to reality, as (M)LLM-based Agents using computers, mobile phones and web browsers by operating within the environments and interfaces (e.g., Graphical User Interface (GUI) and Command Line Interface (CLI)) provided by operating systems (OS) to automate tasks have significantly advanced. This paper presents a comprehensive survey on these advanced agents, designated as OS Agents. We begin by elucidating the fundamentals of OS Agents, exploring their key components and capabilities. We then examine methodologies for constructing OS Agents, focusing on domain-specific foundation models and agent frameworks. A detailed review of evaluation metrics and benchmarks highlights how OS Agents are assessed across diverse platforms and tasks. Finally, we discuss current challenges and identify promising directions for future research. An open-source GitHub repository is maintained as a dynamic resource to foster further innovation in this field.
2024
BiKT: Enabling Bidirectional Knowledge Transfer Between Pretrained Models and Sequential Downstream Tasks
Hang Zeng | Chaoyue Niu | Fan Wu | Shaojie Tang | Leihao Pei | Chengfei Lv | Guihai Chen
Findings of the Association for Computational Linguistics: EMNLP 2024
Hang Zeng | Chaoyue Niu | Fan Wu | Shaojie Tang | Leihao Pei | Chengfei Lv | Guihai Chen
Findings of the Association for Computational Linguistics: EMNLP 2024
Adapting pretrained models to downstream tasks is important in practical applications. Existing frameworks adapt from an initial pretrained model to each downstream task directly, but ignore the sequential nature of the downstream tasks and their feedback effect on the pretrained model. In this work, we propose a new framework, called BiKT, to enable bidirectional knowledge transfer between pretrained models and downstream tasks in rounds. We model each downstream task in the current round as a target task for adaptation and treat all the tasks in the previous rounds as source tasks for feedback. We design a feedback algorithm by multi-task learning over the labeled data of the source tasks, where task-specific prompts are plugged into the backbone network for decoupling task-exclusive knowledge from task-shared knowledge. We further utilize the good initiation of the new backbone network updated in the feedback phase and the trained prompts of the source tasks for adaptation. Evaluation over 9 GLUE datasets, 6 SuperGLUE datasets, and 8 other datasets using models with different pretraining levels and different parameter scales shows remarkable improvement in full-shot and few-shot adaptation settings.
Search
Fix author
Co-authors
- Guihai Chen 2
- Chengfei Lv 2
- Fei Wu 2
- Shengyu Zhang 2
- Shihao Bai 1
- Junyi Chen 1
- Yurun Chen 1
- Chuheng Du 1
- Ruihao Gong 1
- Xueyu Hu 1
- Yuchen Eleanor Jiang 1
- Zhonghua Jiang 1
- Kun Kuang 1
- Jiwei Li 1
- Kunxi Li 1
- Yuhuai Li 1
- Shengzhong Liu 1
- Chaoyue Niu 1
- Leihao Pei 1
- Shuofei Qiao 1
- Zhouzhou Shen 1
- Shaojie Tang 1
- Meiling Tao 1
- Guoyin Wang 1
- Liang Wang 1
- Shenzhi Wang 1
- Zaijun Wang 1
- Zhaode Wang 1
- Zhaokai Wang 1
- Zishu Wei 1
- Siyu Wu 1
- Ruixuan Xiao 1
- Tao Xiong 1
- Shengze Xu 1
- Xinchen Xu 1
- Hailong Yang 1
- Hongxia Yang 1
- Jiasheng Ye 1
- Biao Yi 1
- Keting Yin 1
- Hang Zeng 1
- Tieyong Zeng 1
- Zhou Zhao 1
- Ziyu Zhao 1
- Wangchunshu Zhou 1
- Xiangxin Zhou 1