Bohan Yao


2025

pdf bib
Diverse Multi-tool Aggregation with Large Language Models for Enhanced Math Reasoning
Bohan Yao | Vikas Yadav
Findings of the Association for Computational Linguistics: EMNLP 2025

Tool usage is a proven technique for developing high-performance reasoning in large language models (LLMs). Our work is focused on emphasizing the utility of leveraging multiple diverse tools for complex reasoning tasks. We present Multi-TAG, a Multi-Tool AGgregation-based LLM framework that utilizes multiple diverse tools to solve complex math problems over multiple reasoning steps. At each reasoning step, Multi-TAG invokes multiple tools and accepts the solution of the respective step by tools that have majority agreement on the final answer estimate. Multi-TAG strongly outperforms several standard baselines that use individual tools with the same number of runs, highlighting the importance of multi-tool invocation for solving complex reasoning tasks. We also show that naive aggregation of multiple tools at each reasoning step also leads to substantial improvements of up to 35% accuracy. Multi-TAG then further improves these gains by 7.4% on average on MATH500, AIME, AMC, and OlympiadBench.