EMTIR-GRPO: Efficient Multi-Tool Augmented Large Language Models via Reinforcement Learning
Shixin Jiang, Zhihao Zhu, Jiafeng Liang, Yang Wu, Ming Liu, Bing Qin
Abstract
Tool-integrated reasoning (TIR) enables large language models (LLMs) to invoke external tools for tasks beyond their internal capacity but often suffers from tool overuse.Existing approaches leverage imitation learning or reward shaping to improve efficiency, yet mainly target single-tool scenarios and ignore the varying invocation costs across tools in multi-tool reasoning (MTIR). To address these gaps, we propose EMTIR-GRPO, a simple yet effective RL algorithm for cost-aware MTIR. Built upon GRPO, we introduce a composite reward considering format completeness, answer correctness, and tool efficiency.By incorporating a cost-aware coefficient with group optimal cost estimation, EMTIR-GRPO explicitly models heterogeneous tool costs and encourages more cost-effective tool-use strategies. Experiments on MTIR-QA and MTIR-TC demonstrate significant efficiency gains (e.g., 𝛥+10.9 on Tool-Star-7B and 𝛥+3.6 on ReCall-7B) while maintaining or even improving accuracy (e.g., 55.4 vs. 52.0 on Tool-Star-7B). Additional budget-constrained and tool-free evaluations further validate its effectiveness in maximizing cost-efficiency and reducing cognitive offloading.- Anthology ID:
- 2026.findings-acl.1388
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 27877–27894
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1388/
- DOI:
- Cite (ACL):
- Shixin Jiang, Zhihao Zhu, Jiafeng Liang, Yang Wu, Ming Liu, and Bing Qin. 2026. EMTIR-GRPO: Efficient Multi-Tool Augmented Large Language Models via Reinforcement Learning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 27877–27894, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- EMTIR-GRPO: Efficient Multi-Tool Augmented Large Language Models via Reinforcement Learning (Jiang et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1388.pdf