Boci Peng
2026
COSMOS: Connectivity-Oriented Submodular Maximization for Optimal Subgraph Retrieval
Boci Peng | Xiao Liu | Boren Hu | Yun Zhu | Xuanbo Fan | Yanwei Yue | Chunyu Yang | Yan Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Boci Peng | Xiao Liu | Boren Hu | Yun Zhu | Xuanbo Fan | Yanwei Yue | Chunyu Yang | Yan Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Retrieving coherent evidence subgraphs is critical for Knowledge Base Question Answering (KBQA). Existing paradigms often treat facts independently, rely on biased heuristics, or employ myopic search, failing to optimize collective subgraph utility. In this paper, we propose **COSMOS** (**C**onnectivity-**O**riented **S**ubmodular **M**aximization for **O**ptimal **S**ubgraph Retrieval), a unified framework that formalizes evidence retrieval as a constrained submodular maximization problem. This formulation mathematically captures the trade-off between information relevance and structural complexity. To tractably solve this combinatorial challenge, COSMOS employs a decompose-and-conquer strategy, which first performs a seed-guided greedy expansion to maximize local semantic utility, followed by a topology-aware component aggregation to bridge disjoint evidence clusters via Maximum Spanning Tree aggregation. Guided by theoretical bounds, we introduce Structure-Aware Contrastive Tuning to align semantic space with KG topology. Experimental results on WebQSP, CWQ, and M3GQA benchmarks demonstrate that COSMOS achieves state-of-the-art performance.
2025
M³GQA: A Multi-Entity Multi-Hop Multi-Setting Graph Question Answering Benchmark
Boci Peng | Yongchao Liu | Xiaohe Bo | Jiaxin Guo | Yun Zhu | Xuanbo Fan | Chuntao Hong | Yan Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Boci Peng | Yongchao Liu | Xiaohe Bo | Jiaxin Guo | Yun Zhu | Xuanbo Fan | Chuntao Hong | Yan Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recently, GraphRAG systems have achieved remarkable progress in enhancing the performance and reliability of large language models (LLMs). However, most previous benchmarks are template-based and primarily focus on few-entity queries, which are monotypic and simplistic, failing to offer comprehensive and robust assessments. Besides, the lack of ground-truth reasoning paths also hinders the assessments of different components in GraphRAG systems. To address these limitations, we propose M³GQA, a complex, diverse, and high-quality GraphRAG benchmark focusing on multi-entity queries, with six distinct settings for comprehensive evaluation. In order to construct diverse data with semantically correct ground-truth reasoning paths, we introduce a novel reasoning-driven four-step data construction method, including tree sampling, reasoning path backtracking, query creation, and multi-stage refinement and filtering. Extensive experiments demonstrate that M³GQA effectively reflects the capabilities of GraphRAG methods, offering valuable insights into the model performance and reliability. By pushing the boundaries of current methods, M³GQA establishes a comprehensive, robust, and reliable benchmark for advancing GraphRAG research.