When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems
Xin Yang, Junhao Wang, Bintao Tang, Xuxin Cheng, Cao Liu, Ke Zeng, Wenyuan Jiang
Abstract
Current LLM-based multi-agent systems remain fragile under scaling, even on algorithmically trivial tasks. We introduce MAS-BENCH, a distributed-sorting benchmark that isolates coordination under explicit communication constraints: each agent observes only a local segment and must collectively produce a globally consistent order via broadcasting, peer-to-peer messaging, or a shared key-value store. Across LLM-based agents, success drops sharply as the number of agents grows, exposing persistent failures in shared state, convention alignment, and consistent termination. To mitigate these breakdowns, we propose CAMOC, a lightweight, drop-in proof-of-concept built on collaboration-aware information sharing, early global metadata exchange, and single-commit verification. CAMOC substantially improves coordination success and efficiency across backends, with the largest gains under shared-state interaction. Overall, MAS-BENCH provides a diagnostic benchmark and CAMOC offers a practical step toward more reliable large-scale LLM collaboration, highlighting a gap between individual reasoning and collective correctness.- Anthology ID:
- 2026.findings-acl.1698
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 34002–34021
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1698/
- DOI:
- Cite (ACL):
- Xin Yang, Junhao Wang, Bintao Tang, Xuxin Cheng, Cao Liu, Ke Zeng, and Wenyuan Jiang. 2026. When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems. In Findings of the Association for Computational Linguistics: ACL 2026, pages 34002–34021, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems (Yang et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1698.pdf