Wanfu Wang

2025

As large language models (LLMs) are increasingly applied to complex scientific problem-solving, their effectiveness is often limited by unconscious or failed tool usage. To address this issue, we introduce the Tool-Awareness Training (TAT) method, designed to enhance scientific reasoning. This approach leverages both forward and backward data generation strategies to strengthen the model’s conscious and selective tool utilization in multi-step reasoning tasks. Our method unfolds in three stages: (1) developing tool-knowledge through backward tooluse data generation (2) enhancing tool-awareness in multi-step reasoning by utilizing forward reasoning data, and (3) improving domain adaptability through large-scale domain-specific data for multi-task learning. These three stages progressively establish the foundation for tool learning and scientific reasoning, effectively integrating both, enabling the model to tackle multi-domain scientific tasks while optimizing tool usage. Our experimental results demonstrate that TAT significantly enhances LLM performance in mathematical and scientific reasoning tasks, particularly by improving the model’s tool utilization capabilities, including proactivity and execution success rates.

In inference-time scaling, Chain-of-Thought (CoT) plays a crucial role in enabling large language models (LLMs) to exhibit reasoning capabilities. However, in many scenarios, high-quality CoT data is scarce or even unavailable. In such cases, STaR-like methods can help LLMs synthesize CoT based on user queries and response, but they inevitably suffer from the risk of compounding errors. In this work, we tackle an even more challenging scenario: tool learning in the absence of user queries. We design a data scaling method using back-translation, which establishes an inference cycle to synthesize both user queries and CoT data. To reudce the compounding error of inference time, we introduce two rule-based verifiers to assess the validity of the synthesized CoT data. In particular, the Cycle Verifier facilitates performance improvement by continuously accumulating new data over multiple iterations. Our approach achieves a 75.4% pass rate and a 79.6% win rate using small models (7B) in StableToolBench. Notably, these results are obtained exclusively from self-synthesized high-quality data, without relying on external supervision or expert trajectories for warm-up.

Co-authors

Wenjing Xie 1

Wenjin Xie 1

Qiaoming Zhu (朱巧明) 1

Venues

findings2
ws2

Fix author