Xiaohe Bo

2025

Recently, GraphRAG systems have achieved remarkable progress in enhancing the performance and reliability of large language models (LLMs). However, most previous benchmarks are template-based and primarily focus on few-entity queries, which are monotypic and simplistic, failing to offer comprehensive and robust assessments. Besides, the lack of ground-truth reasoning paths also hinders the assessments of different components in GraphRAG systems. To address these limitations, we propose M³GQA, a complex, diverse, and high-quality GraphRAG benchmark focusing on multi-entity queries, with six distinct settings for comprehensive evaluation. In order to construct diverse data with semantically correct ground-truth reasoning paths, we introduce a novel reasoning-driven four-step data construction method, including tree sampling, reasoning path backtracking, query creation, and multi-stage refinement and filtering. Extensive experiments demonstrate that M³GQA effectively reflects the capabilities of GraphRAG methods, offering valuable insights into the model performance and reliability. By pushing the boundaries of current methods, M³GQA establishes a comprehensive, robust, and reliable benchmark for advancing GraphRAG research.

pdf bib abs
Incorporating Review-missing Interactions for Generative Explainable Recommendation
Xi Li | Xiaohe Bo | Chen Ma | Xu Chen
Proceedings of the 31st International Conference on Computational Linguistics

Explainable recommendation has attracted much attention from the academic and industry communities. Traditional models usually leverage user reviews as ground truths for model training, and the interactions without reviews are totally ignored. However, in practice, a large amount of users may not leave reviews after purchasing items. In this paper, we argue that the interactions without reviews may also contain comprehensive user preferences, and incorporating them to build explainable recommender model may further improve the explanation quality. To follow such intuition, we first leverage generative models to predict the missing reviews, and then train the recommender model based on all the predicted and original reviews. In specific, since the reviews are discrete tokens, we regard the review generation process as a reinforcement learning problem, where each token is an action at one step. We hope that the generated reviews are indistinguishable with the real ones. Thus, we introduce an discriminator as a reward model to evaluate the quality of the generated reviews. At last, to smooth the review generation process, we introduce a self-paced learning strategy to first generate shorter reviews and then predict the longer ones. We conduct extensive experiments on three publicly available datasets to demonstrate the effectiveness of our model.

Co-authors

Yongchao Liu 1

Chen Ma 1

Boci Peng 1

Yan Zhang (张琰, 张廷) 1

Yun Zhu 1

Venues

acl1
coling1

Fix author