Scalability of LLM-Based Multi-Agent Systems for Scientific Code Generation: A Preliminary Study

Yuru Wang; Kaiyan Zhang; Kai Tian; Sihang Zeng; Xingtai Lv; Ning Ding; Biqing Qi; Bowen Zhou

Scalability of LLM-Based Multi-Agent Systems for Scientific Code Generation: A Preliminary Study

Yuru Wang, Kaiyan Zhang, Kai Tian, Sihang Zeng, Xingtai Lv, Ning Ding, Biqing Qi, Bowen Zhou

Abstract

Recent studies indicate that LLM-based Multi-Agent Systems (MAS) encounter scalability challenges in complex mathematical problem-solving or coding tasks, exhibiting issues such as inconsistent role adherence and ineffective inter-agent communication. Moreover, the performance advantages of LLM-based MAS over a single agent employing test-time scaling methods (e.g., majority voting) remain marginal. This raises a critical question: Can LLM-based MAS scale effectively to achieve performance comparable to standalone LLMs or even Large Reasoning Models (LRMs) under optimal test-time compute?In this paper, we conduct a preliminary investigation into the scalability of LLM-based MAS for scientific code generation. We propose a simple yet scalable two-player framework based on iterative critic-in-the-loop refinement. Our experiments demonstrate that a minimalist actor-critic framework based on DeepSeek-V3 can outperform DeepSeek-R1 under equivalent computational budgets. Surprisingly, more complex frameworks fail to yield significant gains. These findings corroborate recent insights into multi-agent system limitations and highlight the importance of scalable workflows for advancing scientific code generation.

Anthology ID:: 2025.mathnlp-main.4
Volume:: Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025)
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Marco Valentino, Deborah Ferreira, Mokanarangan Thayaparan, Leonardo Ranaldi, Andre Freitas
Venues:: MathNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 50–61
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.mathnlp-main.4/
DOI:
Bibkey:
Cite (ACL):: Yuru Wang, Kaiyan Zhang, Kai Tian, Sihang Zeng, Xingtai Lv, Ning Ding, Biqing Qi, and Bowen Zhou. 2025. Scalability of LLM-Based Multi-Agent Systems for Scientific Code Generation: A Preliminary Study. In Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025), pages 50–61, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Scalability of LLM-Based Multi-Agent Systems for Scientific Code Generation: A Preliminary Study (Wang et al., MathNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.mathnlp-main.4.pdf

PDF Cite Search Fix data