LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

Saaket Agashe, Yue Fan, Anthony Reyna, Xin Eric Wang


Abstract
Large Language Models (LLMs) have demonstrated emergent common-sense reasoning and Theory of Mind (ToM) capabilities, making them promising candidates for developing coordination agents. This study introduces the LLM-Coordination Benchmark, a novel benchmark for analyzing LLMs in the context of Pure Coordination Settings, where agents must cooperate to maximize gains. Our benchmark evaluates LLMs through two distinct tasks. The first is Agentic Coordination, where LLMs act as proactive participants in four pure coordination games. The second is Coordination Question Answering (CoordQA), which tests LLMs on 198 multiple-choice questions across these games to evaluate three key abilities: Environment Comprehension, ToM Reasoning, and Joint Planning. Results from Agentic Coordination experiments reveal that LLM-Agents excel in multi-agent coordination settings where decision-making primarily relies on environmental variables but face challenges in scenarios requiring active consideration of partners’ beliefs and intentions. The CoordQA experiments further highlight significant room for improvement in LLMs’ Theory of Mind reasoning and joint planning capabilities. Zero-Shot Coordination (ZSC) experiments in the Agentic Coordination setting demonstrate that LLM agents, unlike RL methods, exhibit robustness to unseen partners. These findings indicate the potential of LLMs as Agents in pure coordination setups and underscore areas for improvement.
Anthology ID:
2025.findings-naacl.448
Volume:
Findings of the Association for Computational Linguistics: NAACL 2025
Month:
April
Year:
2025
Address:
Albuquerque, New Mexico
Editors:
Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8038–8057
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.448/
DOI:
Bibkey:
Cite (ACL):
Saaket Agashe, Yue Fan, Anthony Reyna, and Xin Eric Wang. 2025. LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 8038–8057, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models (Agashe et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.findings-naacl.448.pdf