What Moves the Pareto Frontier in Tool-Using Agents? A Compute-Aware Study of ReAct Variants

Rishi N. Simhadri


Abstract
Tool-using LLM agents are typically compared by accuracy alone, despite deployments being constrained by inference cost. We present a budgeted evaluation of common strategies for improving ReAct-style tool agents (multi-sample aggregation, iterative self-correction, and post-hoc answer revision) using Pareto analysis of cumulative accuracy versus token budget on three benchmarks (HotPotQA, FEVER, GSM8K) with Gemini 2.5 Flash. All experiments use three random seeds (N=500 per seed for HotPotQA/FEVER; N=1,015 for GSM8K); budgeted curves are computed post hoc from per-instance token logs. In our offline evaluation, Reflexion attains the highest accuracy on tool-heavy benchmarks (HotPotQA, FEVER), while CoT-SC leads on GSM8K. Reflexion’s reported token costs are optimistic lower bounds because retries are stopped using ground-truth feedback, and its accuracy is similarly optimistic: a deployment without access to ground-truth labels would not achieve the same accuracy because the gold-label stopping criterion would be unavailable; both costs and accuracy would differ in practice. Sampling-based approaches often spend 3-5x more tokens for comparatively small gains on tool-heavy tasks. GSM8K, a pure-math benchmark with minimal tool interaction, shows substantially larger gains for CoT-SC, TCAR, and Reflexion, larger than on tool-heavy benchmarks, though less sharply separated than headline accuracy alone would suggest, consistent with repeated tool trajectories being an important contributor to the observed efficiency gap in our tool-heavy settings. We provide a compute-aware evaluation protocol (frontier analysis and marginal-cost metrics) and practical guidance for choosing agent designs under different budget regimes.
Anthology ID:
2026.acl-srw.82
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
923–940
Language:
URL:
https://preview.aclanthology.org/ingestion-form-platform/2026.acl-srw.82/
DOI:
Bibkey:
Cite (ACL):
Rishi N. Simhadri. 2026. What Moves the Pareto Frontier in Tool-Using Agents? A Compute-Aware Study of ReAct Variants. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 923–940, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
What Moves the Pareto Frontier in Tool-Using Agents? A Compute-Aware Study of ReAct Variants (Simhadri, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-form-platform/2026.acl-srw.82.pdf