Jiale Hong


2026

Existing multi-LLM collaboration systems often encounter scalability challenges when integrating new LLMs and tasks, leading to suboptimal performance. To address this, we propose SMCS, a Scalable Multi-LLM Collaboration System designed to effectively coordinate multiple open-source LLMs. The system consists of two core components: a Retrieval-based Prior Selection (RPS) module, which dynamically selects the most suitable LLMs for each input, and an Exploration–Exploitation-Driven Posterior Enhancement (EPE) module, which fosters response diversity and selects high-quality outputs through a hybrid scoring mechanism. Experiments on eight mainstream benchmarks validate the effectiveness of our system: by integrating fifteen open-source LLMs, SMCS outperforms prevailing closed-source LLMs, e.g., GPT-4.1(**+5.36%**) and GPT-o3-mini(**+5.28%**) across multiple tasks. Remarkably, it even exceeds the average of best results on different datasets with open-source LLMs (**+2.86%**), significantly advancing the empirical performance frontier of open-source collaboration. The code is released at https://github.com/magent4aci/SMCS.

2025

Game development is a highly specialized task that relies on a complex game engine powered by complex programming languages, preventing many gaming enthusiasts from handling it. This paper introduces the Chat Game Engine (ChatGE) powered by LLM, which allows everyone to develop a custom game using natural language through Human-LLM interaction. To enable an LLM to function as a ChatGE, we instruct it to perform the following processes in each turn: (1) Pscript: configure the game script segment based on the user’s input; (2) Pcode: generate the corresponding code snippet based on the game script segment; (3) Putter: interact with the user, including guidance and feedback. We propose a data synthesis pipeline based on LLM to generate game script-code pairs and interactions from a few manually crafted seed data. We propose a three-stage training strategy following curriculum learning principles to transfer the dialogue-based LLM to our ChatGE smoothly. We construct a ChatGE for poker games as a case study and comprehensively evaluate it from two perspectives: interaction quality and code correctness.