Bintao Tang
2026
When 20 Agents Fail to Sort: The Distributed Sorting Benchmark for Scalable Multi-Agent Systems
Xin Yang | Junhao Wang | Bintao Tang | Xuxin Cheng | Cao Liu | Ke Zeng | Wenyuan Jiang
Findings of the Association for Computational Linguistics: ACL 2026
Xin Yang | Junhao Wang | Bintao Tang | Xuxin Cheng | Cao Liu | Ke Zeng | Wenyuan Jiang
Findings of the Association for Computational Linguistics: ACL 2026
Current LLM-based multi-agent systems remain fragile under scaling, even on algorithmically trivial tasks. We introduce MAS-BENCH, a distributed-sorting benchmark that isolates coordination under explicit communication constraints: each agent observes only a local segment and must collectively produce a globally consistent order via broadcasting, peer-to-peer messaging, or a shared key-value store. Across LLM-based agents, success drops sharply as the number of agents grows, exposing persistent failures in shared state, convention alignment, and consistent termination. To mitigate these breakdowns, we propose CAMOC, a lightweight, drop-in proof-of-concept built on collaboration-aware information sharing, early global metadata exchange, and single-commit verification. CAMOC substantially improves coordination success and efficiency across backends, with the largest gains under shared-state interaction. Overall, MAS-BENCH provides a diagnostic benchmark and CAMOC offers a practical step toward more reliable large-scale LLM collaboration, highlighting a gap between individual reasoning and collective correctness.