MuSe: Multi-Stage Graph Reasoning via Vision-Language Models

Guanyu Wang, Xu Chu, Zhijie Tan, Xinrong Chen, Tong Mo, Weiping Li


Abstract
Graph-related tasks are traditionally addressed with Graph Neural Networks (GNNs) or graph transformers, but their task-specific training limits generalization. Large Language Models (LLMs) offer stronger generalization, yet encoding graphs as one-dimensional text struggles to capture multi-hop dependencies and two-dimensional topology. Vision-Language Models (VLMs) provide an alternative by visualizing graphs, but rendering large graphs in a single image causes clutter, occlusion, and distraction, hindering reasoning. We propose MuSe, a novel multi-stage graph reasoning framework based on VLMs. Instead of processing entire graphs at once, MuSe incrementally samples and visualizes task-relevant subgraphs, enabling progressive reasoning. The framework employs a two-stage training paradigm: supervised fine-tuning to acquire local sampling and reasoning skills, followed by reinforcement learning with GRPO to refine the sampling strategy and control dialog length.To support evaluation, we introduce LGVLQA, a new multimodal dataset with larger and more complex graph structures, addressing the scalability limitations of existing benchmarks. Experiments show that MuSe consistently outperforms leading LLM and VLM baselines, demonstrating improved structural understanding and reasoning ability.
Anthology ID:
2026.acl-long.476
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10442–10462
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.476/
DOI:
Bibkey:
Cite (ACL):
Guanyu Wang, Xu Chu, Zhijie Tan, Xinrong Chen, Tong Mo, and Weiping Li. 2026. MuSe: Multi-Stage Graph Reasoning via Vision-Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10442–10462, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
MuSe: Multi-Stage Graph Reasoning via Vision-Language Models (Wang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.476.pdf
Checklist:
 2026.acl-long.476.checklist.pdf