BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering

Taolin Zhang; Dongyang Li; Qizhou Chen; Chengyu Wang; Xiaofeng He

BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering

Taolin Zhang, Dongyang Li, Qizhou Chen, Chengyu Wang, Xiaofeng He

Abstract

Multi-hop question answering (QA) involves finding multiple relevant passages and performing step-by-step reasoning to answer complex questions. Previous works on multi-hop QA employ specific methods from different modeling perspectives based on large language models (LLMs), regardless of the question types. In this paper, we first conduct an in-depth analysis of public multi-hop QA benchmarks, dividing the questions into four types and evaluating five types of cutting-edge methods for multi-hop QA: Chain-of-Thought (CoT), Single-step, Iterative-step, Sub-step, and Adaptive-step. We find that different types of multi-hop questions have varying degrees of sensitivity to different types of methods. Thus, we propose a Bi-levEL muLti-agEnt reasoning (BELLE) framework to address multi-hop QA by specifically focusing on the correspondence between question types and methods, where each type of method is regarded as an ”operator” by prompting LLMs differently. The first level of BELLE includes multiple agents that debate to obtain an executive plan of combined ”operators” to address the multi-hop QA task comprehensively. During the debate, in addition to the basic roles of affirmative debater, negative debater, and judge, at the second level, we further leverage fast and slow debaters to monitor whether changes in viewpoints are reasonable. Extensive experiments demonstrate that BELLE significantly outperforms strong baselines in various datasets. Additionally, the model consumption of BELLE is higher cost-effectiveness than that of single models in more complex multi-hop QA scenarios.

Anthology ID:: 2025.acl-long.211
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4184–4202
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.211/
DOI:
Bibkey:
Cite (ACL):: Taolin Zhang, Dongyang Li, Qizhou Chen, Chengyu Wang, and Xiaofeng He. 2025. BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4184–4202, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: BELLE: A Bi-Level Multi-Agent Reasoning Framework for Multi-Hop Question Answering (Zhang et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.211.pdf

PDF Cite Search Fix data