Zezhong Ding
2026
See or Say Graphs: Agent-Driven Scalable Graph Understanding with Vision-Language Models
Shuo Han | Yukun Cao | Zezhong Ding | Zengyi Gao | S Kevin Zhou | Xike Xie
Findings of the Association for Computational Linguistics: ACL 2026
Shuo Han | Yukun Cao | Zezhong Ding | Zengyi Gao | S Kevin Zhou | Xike Xie
Findings of the Association for Computational Linguistics: ACL 2026
Vision-language models (VLMs) have shown promise in graph understanding, but remain limited by input-token constraints, facing scalability bottlenecks and lacking effective mechanisms to coordinate textual and visual modalities. To address these challenges, we propose GraphVista, a unified framework that enhances both scalability and modality coordination in graph understanding. For scalability, GraphVista organizes graph information hierarchically into a lightweight GraphRAG base, which retrieves only task-relevant textual descriptions and high-resolution visual subgraphs, compressing redundant context while preserving key reasoning elements. For modality coordination, GraphVista introduces a planning agent that routes tasks to the most suitable modality—using the text modality for simple property reasoning and the visual modality for local and structurally complex reasoning grounded in explicit topology. Extensive experiments demonstrate that GraphVista scales to large graphs, up to 200× larger than those used in existing benchmarks, and consistently outperforms existing textual, visual, and fusion-based methods, achieving up to 4.4× quality improvement over the state-of-the-art baselines by fully exploiting the complementary strengths of both modalities.
2025
GraphInsight: Unlocking Insights in Large Language Models for Graph Structure Understanding
Yukun Cao | Shuo Han | Zengyi Gao | Zezhong Ding | Xike Xie | S Kevin Zhou
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yukun Cao | Shuo Han | Zengyi Gao | Zezhong Ding | Xike Xie | S Kevin Zhou
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Although Large Language Models (LLMs) have demonstrated potential in processing graphs, they struggle with comprehending graphical structure information through prompts of graph description sequences, especially as the graph size increases. We attribute this challenge to the uneven memory performance of LLMs across different positions in graph description sequences, known as ”Positional bias”. To address this, we propose GraphInsight, a novel framework aimed at improving LLMs’ comprehension of both macro- and micro-level graphical information. GraphInsight is grounded in two key strategies: 1) placing critical graphical information in positions where LLMs exhibit stronger memory performance, and 2) investigating a lightweight external knowledge base for regions with weaker memory performance, inspired by retrieval-augmented generation (RAG). Moreover, GraphInsight explores integrating these two strategies into LLM agent processes for composite graph tasks that require multi-step reasoning. Extensive empirical studies on benchmarks with a wide range of evaluation tasks show that GraphInsight significantly outperforms all other graph description methods (e.g., prompting techniques and reordering strategies) in understanding graph structures of varying sizes.