Qi Chai
2025
VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft
Honghao Fu
|
Junlong Ren
|
Qi Chai
|
Deheng Ye
|
Yujun Cai
|
Hao Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have shown significant promise in embodied decision-making tasks within virtual open-world environments. Nonetheless, their performance is hindered by the absence of domain-specific knowledge. Methods that finetune on large-scale domain-specific data entail prohibitive development costs. This paper introduces VistaWise, a cost-effective agent framework that integrates cross-modal domain knowledge and finetunes a dedicated object detection model for visual analysis. It reduces the requirement for domain-specific training data from millions of samples to a few hundred. VistaWise integrates visual information and textual dependencies into a cross-modal knowledge graph (KG), enabling a comprehensive and accurate understanding of multimodal environments. We also equip the agent with a retrieval-based pooling strategy to extract task-related information from the KG, and a desktop-level skill library to support direct operation of the Minecraft desktop client via mouse and keyboard inputs. Experimental results demonstrate that VistaWise achieves state-of-the-art performance across various open-world tasks, highlighting its effectiveness in reducing development costs while enhancing agent performance.
CausalMACE: Causality Empowered Multi-Agents in Minecraft Cooperative Tasks
Qi Chai
|
Zhang Zheng
|
Junlong Ren
|
Deheng Ye
|
Zichuan Lin
|
Hao Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Minecraft, as an open-world virtual interactive environment, has become a prominent platform for research on agent decision-making and execution. Existing works primarily adopt a single Large Language Model (LLM) agent to complete various in-game tasks. However, for complex tasks requiring lengthy sequences of actions, single-agent approaches often face challenges related to inefficiency and limited fault tolerance. Despite these issues, research on multi-agent collaboration remains scarce. In this paper, we propose CausalMACE, a holistic causality planning framework designed to enhance multi-agent systems, in which we incorporate causality to manage dependencies among subtasks. Technically, our proposed framework introduces two modules: an overarching task graph for global task planning and a causality-based module for dependency management, where inherent rules are adopted to perform causal intervention. Experimental results demonstrate our approach achieves state-of-the-art performance in multi-agent cooperative tasks of Minecraft. The code will be open-sourced upon the acceptance of this paper.
Search
Fix author
Co-authors
- Junlong Ren 2
- Hao Wang (汪浩, 王昊, 王浩) 2
- Deheng Ye 2
- Yujun Cai 1
- Honghao Fu 1
- show all...