When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems

Xin Yang, Junhao Wang, Bintao Tang, Xuxin Cheng, Cao Liu, Ke Zeng, Wenyuan Jiang


Abstract
Current LLM-based multi-agent systems remain fragile under scaling, even on algorithmically trivial tasks. We introduce MAS-BENCH, a distributed-sorting benchmark that isolates coordination under explicit communication constraints: each agent observes only a local segment and must collectively produce a globally consistent order via broadcasting, peer-to-peer messaging, or a shared key-value store. Across LLM-based agents, success drops sharply as the number of agents grows, exposing persistent failures in shared state, convention alignment, and consistent termination. To mitigate these breakdowns, we propose CAMOC, a lightweight, drop-in proof-of-concept built on collaboration-aware information sharing, early global metadata exchange, and single-commit verification. CAMOC substantially improves coordination success and efficiency across backends, with the largest gains under shared-state interaction. Overall, MAS-BENCH provides a diagnostic benchmark and CAMOC offers a practical step toward more reliable large-scale LLM collaboration, highlighting a gap between individual reasoning and collective correctness.
Anthology ID:
2026.findings-acl.1698
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34002–34021
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1698/
DOI:
Bibkey:
Cite (ACL):
Xin Yang, Junhao Wang, Bintao Tang, Xuxin Cheng, Cao Liu, Ke Zeng, and Wenyuan Jiang. 2026. When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems. In Findings of the Association for Computational Linguistics: ACL 2026, pages 34002–34021, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems (Yang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1698.pdf
Checklist:
 2026.findings-acl.1698.checklist.pdf