Shufei Zhang
2025
ReKG-MCTS: Reinforcing LLM Reasoning on Knowledge Graphs via Training-Free Monte Carlo Tree Search
Xiaozhuang Song
|
Shufei Zhang
|
Tianshu Yu
Findings of the Association for Computational Linguistics: ACL 2025
Recent advancements in combining knowledge graphs (KGs) with large language models (LLMs) have demonstrated promising potential in complex KG reasoning tasks, yet existing approaches face limitations in path exploration strategies or excessive computational overhead. We propose ReKG-MCTS, a novel training-free framework that synergizes Monte Carlo Tree Search (MCTS) with LLM capabilities to enable dynamic reasoning over KGs. The framework conceptualizes KG reasoning as a decision-making process, where MCTS strategically explores paths over KG while LLMs provide semantic guidance for reasoning paths. The framework consists of four phases: (1) UCB-based node selection that balances exploration-exploitation on KG, (2) path expansion with KG structural constraints, (3) LLM-guided MC rollouts for simulation, and (4) value backpropagation. Experimental results on WebQSP and CWQ demonstrate that ReKG-MCTS outperforms existing training-free methods and achieves competitive performance compared to fine-tuned baselines. These findings suggest a new paradigm for leveraging language models in KG reasoning tasks. The code is available at https://github.com/ShawnKS/rekgmcts.
LLaMA-Berry: Pairwise Optimization for Olympiad-level Mathematical Reasoning via O1-like Monte Carlo Tree Search
Di Zhang
|
Jianbo Wu
|
Jingdi Lei
|
Tong Che
|
Jiatong Li
|
Tong Xie
|
Xiaoshui Huang
|
Shufei Zhang
|
Marco Pavone
|
Yuqiang Li
|
Wanli Ouyang
|
Dongzhan Zhou
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
This paper presents LLaMA-Berry, an advanced mathematical reasoning framework to enhance the problem-solving ability of large language models (LLMs). The framework combines Monte Carlo Tree Search with Self-Refine (SR-MCTS) to optimize the reasoning paths and utilizes a pairwise reward model to evaluate different paths globally. By leveraging the self-critique and rewriting capabilities of LLMs, our SR-MCTS overcomes the inefficiencies and limitations of conventional step-wise and greedy search algorithms, enabling a more efficient exploration of solution spaces. To guide the search process, we propose the Pairwise Preference Reward Model (PPRM), which predicts pairwise preferences between solutions through instruction-following capabilities trained by Reinforcement Learning from Human Feedback (RLHF). Finally, the Enhanced Borda Count (EBC) method is adopted to synthesize pairwise preferences into global quantile scores for evaluations. This approach mitigates the challenges of scoring variability and non-independent distributions in mathematical reasoning tasks. The framework has been tested on general and advanced benchmarks, showing superior search efficiency and performance compared to existing open-source and closed-source methods, particularly in complex Olympiad-level benchmarks, including AIME24 and AMC23.
Search
Fix author
Co-authors
- Tong Che 1
- Xiaoshui Huang 1
- Jingdi Lei 1
- Jiatong Li 1
- Yuqiang Li 1
- show all...