Shuo Han
2026
See or Say Graphs: Agent-Driven Scalable Graph Understanding with Vision-Language Models
Shuo Han | Yukun Cao | Zezhong Ding | Zengyi Gao | S Kevin Zhou | Xike Xie
Findings of the Association for Computational Linguistics: ACL 2026
Shuo Han | Yukun Cao | Zezhong Ding | Zengyi Gao | S Kevin Zhou | Xike Xie
Findings of the Association for Computational Linguistics: ACL 2026
Vision-language models (VLMs) have shown promise in graph understanding, but remain limited by input-token constraints, facing scalability bottlenecks and lacking effective mechanisms to coordinate textual and visual modalities. To address these challenges, we propose GraphVista, a unified framework that enhances both scalability and modality coordination in graph understanding. For scalability, GraphVista organizes graph information hierarchically into a lightweight GraphRAG base, which retrieves only task-relevant textual descriptions and high-resolution visual subgraphs, compressing redundant context while preserving key reasoning elements. For modality coordination, GraphVista introduces a planning agent that routes tasks to the most suitable modality—using the text modality for simple property reasoning and the visual modality for local and structurally complex reasoning grounded in explicit topology. Extensive experiments demonstrate that GraphVista scales to large graphs, up to 200× larger than those used in existing benchmarks, and consistently outperforms existing textual, visual, and fusion-based methods, achieving up to 4.4× quality improvement over the state-of-the-art baselines by fully exploiting the complementary strengths of both modalities.
RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs in Medicine
Jiatan Huang | Mingchen Li | Zonghai Yao | Dawei Li | Yuxin Zhang | Zhichao Yang | Yongkang Xiao | Feiyun Ouyang | Xiaohan Li | Shuo Han | Hong yu
Findings of the Association for Computational Linguistics: ACL 2026
Jiatan Huang | Mingchen Li | Zonghai Yao | Dawei Li | Yuxin Zhang | Zhichao Yang | Yongkang Xiao | Feiyun Ouyang | Xiaohan Li | Shuo Han | Hong yu
Findings of the Association for Computational Linguistics: ACL 2026
Answering complex real-world questions in the medical domain often requires accurate retrieval from medical Textual Knowledge Graphs (medical TKGs), as the relational path information from TKGs could enhance the inference ability of Large Language Models (LLMs). However, the main bottlenecks lie in the scarcity of existing medical TKGs, the limited expressiveness of their topological structures, and the lack of comprehensive evaluations of current retrievers for medical TKGs. To address these challenges, we first develop a dataset for LLMs Complex Reasoning over medical Textual Knowledge Graphs (RiTeK), covering a broad range of topological structures. Specifically, we synthesize realistic user queries integrating diverse topological structures, relational information, and complex textual descriptions. We conduct a rigorous medical expert evaluation process to assess and validate the quality of our synthesized queries. RiTeK also serves as a comprehensive benchmark dataset for evaluating the capabilities of retrieval systems built upon LLMs. By assessing 11 representative retrievers on this benchmark, we observe that existing methods struggle to perform well, revealing notable limitations in current LLM-driven retrieval approaches. These findings highlight the pressing need for more effective retrieval systems tailored for semi-structured data in the medical domain.
2025
From Scores to Steps: Diagnosing and Improving LLM Performance in Evidence-Based Medical Calculations
Benlu Wang | Iris Xia | Yifan Zhang | Junda Wang | Feiyun Ouyang | Shuo Han | Arman Cohan | Hong Yu | Zonghai Yao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Benlu Wang | Iris Xia | Yifan Zhang | Junda Wang | Feiyun Ouyang | Shuo Han | Arman Cohan | Hong Yu | Zonghai Yao
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have demonstrated promising performance on medical benchmarks; however, their ability to perform medical calculations, a crucial aspect of clinical decision-making, remains underexplored and poorly evaluated. Existing benchmarks often assess only the final answer with a wide numerical tolerance, overlooking systematic reasoning failures and potentially causing serious clinical misjudgments. In this work, we revisit medical calculation evaluation with a stronger focus on clinical trustworthiness. First, we clean and restructure the MedCalc-Bench dataset and propose a new step-by-step evaluation pipeline that independently assesses formula selection, entity extraction, and arithmetic computation. Under this granular framework, the accuracy of GPT-4o drops from 62.7% to 43.6%, revealing errors masked by prior evaluations. Second, we introduce an automatic error analysis framework that generates structured attribution for each failure mode. Human evaluation confirms its alignment with expert judgment, enabling scalable and explainable diagnostics. Finally, we propose a modular agentic pipeline, MedRaC, that combines retrieval-augmented generation and Python-based code execution. Without any fine-tuning, MedRaC improves the accuracy of different LLMs from 16.35% up to 53.19%. Our work highlights the limitations of current benchmark practices and proposes a more clinically faithful methodology. By enabling transparent and transferable reasoning evaluation, we move closer to making LLM-based systems trustworthy for real-world medical applications.
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models
Hieu Tran | Zonghai Yao | Zhichao Yang | Junda Wang | Yifan Zhang | Shuo Han | Feiyun Ouyang | Hong Yu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hieu Tran | Zonghai Yao | Zhichao Yang | Junda Wang | Yifan Zhang | Shuo Han | Feiyun Ouyang | Hong Yu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This work introduces RARE (Retrieval-Augmented Reasoning Enhancement), a versatile extension to the mutual reasoning framework (rStar), aimed at enhancing reasoning accuracy and factual integrity across large language models (LLMs) for complex, knowledge-intensive tasks such as medical and commonsense reasoning. RARE incorporates two innovative actions within the Monte Carlo Tree Search (MCTS) framework: (A6), which generates search queries based on the initial problem statement, performs information retrieval using those queries, and augments reasoning with the retrieved data to formulate the final answer; and (A7), which leverages information retrieval specifically for generated sub-questions and re-answers these sub-questions with the relevant contextual information. Additionally, a Retrieval-Augmented Factuality Scorer is proposed to replace the original discriminator, prioritizing reasoning paths that meet high standards of factuality. Experimental results with LLaMA 3.1 show that RARE enables open-source LLMs to achieve competitive performance with top closed-source models like GPT-4 and GPT-4o. This research establishes RARE as a scalable solution for improving LLMs in domains where logical coherence and factual integrity are critical.
GraphInsight: Unlocking Insights in Large Language Models for Graph Structure Understanding
Yukun Cao | Shuo Han | Zengyi Gao | Zezhong Ding | Xike Xie | S Kevin Zhou
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yukun Cao | Shuo Han | Zengyi Gao | Zezhong Ding | Xike Xie | S Kevin Zhou
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Although Large Language Models (LLMs) have demonstrated potential in processing graphs, they struggle with comprehending graphical structure information through prompts of graph description sequences, especially as the graph size increases. We attribute this challenge to the uneven memory performance of LLMs across different positions in graph description sequences, known as ”Positional bias”. To address this, we propose GraphInsight, a novel framework aimed at improving LLMs’ comprehension of both macro- and micro-level graphical information. GraphInsight is grounded in two key strategies: 1) placing critical graphical information in positions where LLMs exhibit stronger memory performance, and 2) investigating a lightweight external knowledge base for regions with weaker memory performance, inspired by retrieval-augmented generation (RAG). Moreover, GraphInsight explores integrating these two strategies into LLM agent processes for composite graph tasks that require multi-step reasoning. Extensive empirical studies on benchmarks with a wide range of evaluation tasks show that GraphInsight significantly outperforms all other graph description methods (e.g., prompting techniques and reordering strategies) in understanding graph structures of varying sizes.
2021
A Secure and Efficient Federated Learning Framework for NLP
Chenghong Wang | Jieren Deng | Xianrui Meng | Yijue Wang | Ji Li | Sheng Lin | Shuo Han | Fei Miao | Sanguthevar Rajasekaran | Caiwen Ding
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Chenghong Wang | Jieren Deng | Xianrui Meng | Yijue Wang | Ji Li | Sheng Lin | Shuo Han | Fei Miao | Sanguthevar Rajasekaran | Caiwen Ding
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
In this work, we consider the problem of designing secure and efficient federated learning (FL) frameworks for NLP. Existing solutions under this literature either consider a trusted aggregator or require heavy-weight cryptographic primitives, which makes the performance significantly degraded. Moreover, many existing secure FL designs work only under the restrictive assumption that none of the clients can be dropped out from the training protocol. To tackle these problems, we propose SEFL, a secure and efficient federated learning framework that (1) eliminates the need for the trusted entities; (2) achieves similar and even better model accuracy compared with existing FL designs; (3) is resilient to client dropouts.
Search
Fix author
Co-authors
- Feiyun Ouyang 3
- Zonghai Yao 3
- Hong Yu 3
- Yukun Cao 2
- Zezhong Ding 2
- Zengyi Gao 2
- Junda Wang 2
- Xike Xie 2
- Zhichao Yang 2
- Yifan Zhang 2
- S Kevin Zhou 2
- Arman Cohan 1
- Jieren Deng 1
- Caiwen Ding 1
- Jiatan Huang 1
- Dawei Li 1
- Ji Li 1
- Mingchen Li 1
- Xiaohan Li 1
- Sheng Lin 1
- Xianrui Meng 1
- Fei Miao 1
- Sanguthevar Rajasekaran 1
- Hieu Tran 1
- Benlu Wang 1
- Chenghong Wang 1
- Yijue Wang 1
- Iris Xia 1
- Yongkang Xiao 1
- Yuxin Zhang 1