RASPberry: Retrieval-Augmented Monte Carlo Tree Self-Play with Reasoning Consistency for Multi-Hop Question Answering
Baixuan Li, Yunlong Fan, Tianyi Ma, Miao Gao, Chuanqi Shi, Zhiqiang Gao
Abstract
Complex multi-hop question answering requires large language models (LLMs) not only to retrieve external knowledge but also to reason over the retrieved information in order to arrive at the final solution. This involves two key challenges: (i) how to effectively explore the solution space and generate more potentially correct solution candidates, and (ii) how to select the optimal solution from multiple solution candidates, both of which require a training-free approach without introducing a more powerful teacher model. To address these challenges, we propose Retrieval-Augmented Monte Carlo Tree Self-Play with Reasoning Consistency (RASPberry), which introduces a more flexible action-level sampling granularity compared to existing methods, leverages Monte Carlo Tree Search for efficient solution space exploration, and utilizes an enhanced version of reasoning consistency to guide the selection of the optimal solution. Experimental results demonstrate that our proposed RASPberry effectively tackles the two challenges outlined above, achieving more efficient RAG inference-time scaling. Our code is available at https://github.com/BaixuanLi/RASPberry.- Anthology ID:
- 2025.findings-acl.587
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2025
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 11258–11276
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.findings-acl.587/
- DOI:
- Cite (ACL):
- Baixuan Li, Yunlong Fan, Tianyi Ma, Miao Gao, Chuanqi Shi, and Zhiqiang Gao. 2025. RASPberry: Retrieval-Augmented Monte Carlo Tree Self-Play with Reasoning Consistency for Multi-Hop Question Answering. In Findings of the Association for Computational Linguistics: ACL 2025, pages 11258–11276, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- RASPberry: Retrieval-Augmented Monte Carlo Tree Self-Play with Reasoning Consistency for Multi-Hop Question Answering (Li et al., Findings 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.findings-acl.587.pdf