Ronghui Yang

2026

Large Language Models (LLMs) have shown remarkable capabilities in automating code generation. Recent approaches that incorporate feedback refinement mechanisms into the generation process have further enhanced software generation quality. However, these methods can be characterized as single-path approaches, which suffer from insufficient exploration of the vast solution space, often causing even the most powerful models to get stuck in local optima and struggle to generate the desired software. Some other works use Monte Carlo Tree Search (MCTS) to explore multiple paths for finding the best solution; yet, MCTS can be extremely inefficient in practice. To this end, we propose SeDev, a novel LLM-driven code generation framework that efficiently finds high-quality solutions in only a few iterations. The core idea of SeDev is to gradually explore semantically adjacent solutions through structured prompt guidance and feedback on previous trials, while using unit tests to evaluate the quality of exploration. To distill the exploration experience, SeDev incorporates a feedback synthesis module that translates unit test results within exploration into comprehensive suggestions. We construct a challenging feature oriented software benchmark FSD-bench++, along with two open datasets to evaluate. Experimental results show that SeDev outperforms baselines while maintaining reasonable time and computational costs. Code is available here.

2025

pdf bib abs

LLM-based Multi-agent frameworks have shown a great potential in solving real-world software development tasks, where the agents of different roles can communicate much more efficiently than humans. Despite their efficiency, LLM-based agents can hardly fully understand each other, which frequently causes errors during the development process. Moreover, the accumulation of errors could easily lead to the failure of the whole project. In order to reduce such errors, we introduce an intention aligned multi-agent framework RTADev, which utilizes a self-correction mechanism to ensure that all agents work based on a consensus. RTADev mimics human teams where individuals are free to start meetings anytime for reaching agreement. Specifically, RTADev integrates an alignment checking phase and a conditional ad hoc group review phase, so that the errors can be effectively reduced with minimum agent communications. Our experiments on various software development tasks show that RTADev significantly improves the quality of generated software code in terms of executability, structural and functional completeness. The code of our project is available at https://github.com/codeagent-rl/RTADev.

Co-authors

Jie Liu 1

Jiexin Wang 1

Guohua Wang 1

Venues

ACL1
Findings1

Fix author