Wenxuan Lu


2025

pdf bib
Cultivating Gaming Sense for Yourself: Making VLMs Gaming Experts
Wenxuan Lu | Jiangyang He | Zhanqiu Zhang | Steven Y. Guo | Tianning Zang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Developing agents capable of fluid gameplay in first/third-person games without API access remains a critical challenge in Artificial General Intelligence (AGI). Recent efforts leverage Vision Language Models (VLMs) as direct controllers, frequently pausing the game to analyze screens and plan action through language reasoning. However, this inefficient paradigm fundamentally restricts agents to basic and non-fluent interactions: relying on isolated VLM reasoning for each action makes it impossible to handle tasks requiring high reactivity (e.g., FPS shooting) or dynamic adaptability (e.g., ACT combat). To handle this, we propose a paradigm shift in gameplay agent design: instead of direct control, VLM serves as a developer, creating specialized execution modules tailored for tasks like shooting and combat. These modules handle real-time game interactions, elevating VLM to a high-level developer. Building upon this paradigm, we introduce GameSense, a gameplay agent framework where VLM develops task-specific game sense modules by observing task execution and leveraging vision tools and neural network training pipelines. These modules encapsulate action-feedback logic, ranging from direct action rules to neural network-based decisions. Experiments demonstrate that our framework is the first to achieve fluent gameplay in diverse genres, including ACT, FPS, and Flappy Bird, setting a new benchmark for game-playing agents.

pdf bib
Global Eye: Breaking the “Fixed Thinking Pattern” during the Instruction Expansion Process
Wenxuan Lu | Wei Liu | Jian Luan | Bin Wang | Songhao Jiang | Tianning Zang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

An extensive high-quality instruction dataset is crucial for the instruction tuning process of Large Language Models (LLMs). Recent instruction expansion methods have demonstrated their capability to improve the quality and quantity of existing datasets, by prompting high-performance LLM to generate multiple new instructions from the original ones. However, existing methods focus on constructing multi-perspective prompts (e.g., increasing complexity or difficulty) to expand instructions, overlooking the “Fixed Thinking Pattern” issue of LLMs. This issue arises when repeatedly using the same set of prompts, causing LLMs to rely on a limited set of certain expressions to expand all instructions, potentially compromising the diversity of the final expanded dataset. This paper theoretically analyzes the causes of the “Fixed Thinking Pattern”, and corroborates this phenomenon through multi-faceted empirical research. Furthermore, we propose a novel method based on dynamic prompt updating: Global Eye. Specifically, after a fixed number of instruction expansions, we analyze the statistical characteristics of newly generated instructions and then update the prompts. Experimental results show that our method enables Llama3-8B and Llama2-13B to surpass the performance of open-source LLMs and GPT3.5 across various metrics. Our code and data are submitted to the Software & Data option.

2024

pdf bib
Improving Knowledge Graph Completion with Structure-Aware Supervised Contrastive Learning
Jiashi Lin | Lifang Wang | Xinyu Lu | Zhongtian Hu | Wei Zhang | Wenxuan Lu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Knowledge Graphs (KGs) often suffer from incomplete knowledge, which restricts their utility. Recently, Contrastive Learning (CL) has been introduced to Knowledge Graph Completion (KGC), significantly improving the discriminative capabilities of KGC models and setting new benchmarks in performance. However, existing contrastive methods primarily focus on individual triples, overlooking the broader structural connectivities and topologies of KGs. This narrow focus limits a comprehensive understanding of the graph’s structural knowledge. To address this gap, we propose StructKGC, a novel contrastive learning framework designed to flexibly accommodate the diverse topologies inherent in KGs. We introduce four contrastive tasks specifically tailored to KG data: Vertex-level CL, Neighbor-level CL, Path-level CL, and Relation composition level CL. These tasks are trained synergistically during the fine-tuning of pre-trained language models (PLMs), allowing for a more nuanced capture of subgraph semantics. To validate the effectiveness of our method, we perform a comprehensive set of experiments on several real-world datasets. The experimental results demonstrate that our approach achieves SOTA performance under standard supervised and low-resource settings. Furthermore, we observe that the various structure-aware tasks introduced can mutually reinforce each other, resulting in consistent performance enhancements.

pdf bib
Hit the Nail on the Head: Parameter-Efficient Multi-task Tuning via Human Language Intervention
Wenxuan Lu | Songhao Jiang | Wang Yijing | Tianning Zang
Findings of the Association for Computational Linguistics: EMNLP 2024

Parameter-Efficient Fine-Tuning (PEFT) on small Pre-trained Language Models (PLMs) has emerged as a promising approach to enhance their multi-tasking capabilities. Prevalent methods simultaneously train additional modules (i.e., one task-shared module and multiple task-specific modules) for adapting PLMs to downstream tasks. However, their adaptability to new tasks is constrained, as the task-specific modules independently adapt to each task, overlooking the potential for knowledge transfer across tasks. In this paper, we propose a novel multi-task learning framework, Inspirational Pointer (IP), that enables the transfer of prior knowledge across tasks through human language intervention. Specifically, we attach task descriptions to the input samples, which are then mapped to corresponding task embeddings. Based on those embeddings, we adapt PLMs for downstream tasks. Similar tasks share akin descriptions, allowing new task samples close to similar trained tasks in the task embedding space, hitting the memory about trained tasks of the model. Our experiments on the T5 model demonstrate performance improvements of our method in multi-task learning and few-shot transfer learning. Further, we implemented the IP in decoder-only models including GPT2 and large language models (LLMs), and the results show that IP enhances the capabilities of decoder-only models.