Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms

Xinlin Wang

Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms

Abstract

Despite the impressive capabilities of large language models, their substantial computational costs, latency, and privacy risks hinder their widespread deployment in real-world applications. Small Language Models (SLMs) with fewer than 10 billion parameters present a promising alternative; however, their inherent limitations in knowledge and reasoning curtail their effectiveness. Existing research primarily focuses on enhancing SLMs through scaling laws or fine-tuning strategies while overlooking the potential of using agent paradigms, such as tool use and multi-agent collaboration, to systematically compensate for the inherent weaknesses of small models. To address this gap, this paper presents the first large-scale, comprehensive study of <10B open-source models under three paradigms: (1) the base model, (2) a single agent equipped with tools, and (3) a routing-based multi-agent system with collaborative capabilities.Our results show that structured agent frameworks (combining step-by-step reasoning and tool use) substantially improve effectiveness over direct prompting, with single-agent systems achieving the best balance between performance and cost. In contrast, routing-based multi-agent setups introduce additional coordination overhead with limited gains under small-model constraints.Our findings highlight the importance of agent-centric design for efficient and trustworthy deployment in resource-constrained settings.

Anthology ID:: 2026.acl-industry.123
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Yunyao Li, Georg Rehm, Mei Tu
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1795–1807
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-industry.123/
DOI:
Bibkey:
Cite (ACL):: Xinlin Wang. 2026. Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1795–1807, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Rethinking Scale: Deployment Trade-offs of Small Language Models under Agent Paradigms (Wang, ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-industry.123.pdf

PDF Cite Search Fix data