Fan Wu (吴凡, 吴钒) - ACL Anthology

Fan Wu

Also published as: 凡吴, 钒吴

2025

The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations. With the evolution of multi-modal large language models ((M)LLMs), this dream is closer to reality, as (M)LLM-based Agents using computers, mobile phones and web browsers by operating within the environments and interfaces (e.g., Graphical User Interface (GUI) and Command Line Interface (CLI)) provided by operating systems (OS) to automate tasks have significantly advanced. This paper presents a comprehensive survey on these advanced agents, designated as OS Agents. We begin by elucidating the fundamentals of OS Agents, exploring their key components and capabilities. We then examine methodologies for constructing OS Agents, focusing on domain-specific foundation models and agent frameworks. A detailed review of evaluation metrics and benchmarks highlights how OS Agents are assessed across diverse platforms and tasks. Finally, we discuss current challenges and identify promising directions for future research. An open-source GitHub repository is maintained as a dynamic resource to foster further innovation in this field.

Extensive LLM applications demand efficient structured generations, particularly for LR(1) grammars, to produce outputs in specified formats (e.g., JSON). Existing methods primarily parse LR(1) grammars into a pushdown automaton (PDA), leading to runtime execution overhead for context-dependent token processing, especially inefficient under large inference batches.To address these issues, we propose Pre³ that exploits deterministic pushdown automata (DPDA) to optimize the constrained LLM decoding efficiency.First, by **pre**computing **pre**fix-conditioned edges during the **pre**processing, Pre³ enables ahead-of-time edge analysis and thus makes parallel transition processing possible.Futher, leveraging the prefix-conditioned edges, Pre³ introduces a novel approach that transforms LR(1) transition graphs into DPDA, eliminating the need for runtime path exploration and achieving edge transitions with minimal overhead.Pre³ can be seamlessly integrated into standard LLM inference frameworks, improving time per output token (TPOT) by up to 40% and throughput by up to 36% in our experiments. Our code is available at https://github.com/ModelTC/lightllm.

This paper introduces MadaKV, a modality-adaptive key-value (KV) cache eviction strategy designed to enhance the efficiency of multimodal large language models (MLLMs) in long-context inference. In multimodal scenarios, attention heads exhibit varying preferences for different modalities, resulting in significant disparities in modality importance across attention heads. Traditional KV cache eviction methods, which are tailored for unimodal settings, fail to capture modality-specific information, thereby yielding suboptimal performance. MadaKV addresses these challenges through two key components: modality preference adaptation and hierarchical compression compensation. By dynamically sensing modality information within attention heads and adaptively retaining critical tokens, MadaKV achieves substantial reductions in KV cache memory footprint and model inference decoding latency (1.3 to 1.5 times improvement) while maintaining high accuracy across various multimodal long-context tasks. Extensive experiments on representative MLLMs and the MileBench benchmark demonstrate the effectiveness of MadaKV compared to existing KV cache eviction methods.

2024

Adapting pretrained models to downstream tasks is important in practical applications. Existing frameworks adapt from an initial pretrained model to each downstream task directly, but ignore the sequential nature of the downstream tasks and their feedback effect on the pretrained model. In this work, we propose a new framework, called BiKT, to enable bidirectional knowledge transfer between pretrained models and downstream tasks in rounds. We model each downstream task in the current round as a target task for adaptation and treat all the tasks in the previous rounds as source tasks for feedback. We design a feedback algorithm by multi-task learning over the labeled data of the source tasks, where task-specific prompts are plugged into the backbone network for decoupling task-exclusive knowledge from task-shared knowledge. We further utilize the good initiation of the new backbone network updated in the feedback phase and the trained prompts of the source tasks for adaptation. Evaluation over 9 GLUE datasets, 6 SuperGLUE datasets, and 8 other datasets using models with different pretraining levels and different parameter scales shows remarkable improvement in full-shot and few-shot adaptation settings.

2023

pdf bib abs
CCL23-Eval 任务3系统报告:基于旋转式位置编码的实体分类在汉语框架语义解析中的应用(System Report for CCL23-Eval Task 3: Application of Entity Classification Model Based on Rotary Position Embedding in Chiness Frame Semantic Parsing)
Zuoheng Li (李作恒) | Xuanzhi Guo (郭炫志) | Dengjian Qiao (乔登俭) | Fan Wu (吴钒)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)

“汉语框架语义解析(Chinese Frame Semantic Parsing,CFSP)是中文自然语言处理领域中的一项重要任务,其目标是从句子中提取框架语义结构,实现对句子中涉及到的事件或情境的深层理解。本文主要研究子任务框架识别和论元角色识别,自然语言处理中常用的方法在框架识别和论元角色识别中会丢失目标词与整体句子之间的位置信息关系以及目标词内部信息,对此本文提出基于旋转式位置编码的实体分类模型对实体之间计算注意力然后进行分类,并在天池“CCL2023-Eval 汉语框架语义解析评测”比赛上获得A、B榜第一名的成绩1。”

2020

pdf bib abs
基于阅读理解框架的中文事件论元抽取(Chinese Event Argument Extraction using Reading Comprehension Framework)
Min Chen (陈敏) | Fan Wu (吴凡) | Zhongqing Wang (王中卿) | Peifeng Li (李培峰) | Qiaoming Zhu (朱巧明)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

传统的事件论元抽取方法把该任务当作句子中实体提及的多分类或序列标注任务,论元角色的类别在这些方法中只能作为向量表示,而忽略了论元角色的先验信息。实际上,论元角色的语义和论元本身有很大关系。对此,本文提议将其当作机器阅读理解任务,把论元角色表述为自然语言描述的问题,通过在上下文中回答这些问题来抽取论元。该方法更好地利用了论元角色类别的先验信息,在ACE2005中文语料上的实验证明了该方法的有效性。

Fan Wu

Fixing paper assignments

2025

2024

2023

2020

2014

Co-authors

Venues