Shengyao Lu

2026

Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited, despite its importance in real-world software systems. We argue that this limitation stems not only from data scarcity, but more fundamentally from the lack of supervision that guides interpretable and effective performance improvements. We introduce PerfCoder, a family of LLMs designed to generate performance-enhanced code through interpretable and customized optimization strategies. PerfCoder is fine-tuned on curated real-world optimization trajectories with human-readable annotations and further aligned via reinforcement fine-tuning using runtime feedback, enabling it to generate input-specific strategies and apply them directly without iterative refinement. On the PIE code performance benchmark, PerfCoder outperforms all existing models in both runtime speedup and effective optimization rate, demonstrating that code performance optimization requires strategy awareness rather than scale alone. Moreover, PerfCoder produces interpretable feedback that can guide larger LLMs in a planner–optimizer workflow, substantially improving the performance of 32B models and GPT-5 on code optimization.

2025

pdf bib abs

The fine-tuning of Large Language Models (LLMs) specialized in code generation has seen notable advancements through the use of open-domain coding queries. Despite the successes, existing methodologies like Evol-Instruct encounter performance limitations, impeding further enhancements in code generation tasks. This paper examines the constraints of existing prompt evolution techniques and introduces a novel approach, Instruction Fusion (IF). IF innovatively combines two distinct prompts through a hybridization process, thereby enhancing the evolution of training prompts for code LLMs. Our experimental results reveal that the proposed novel method effectively addresses the shortcomings of prior methods, significantly improving the performance of Code LLMs across five code generation benchmarks, namely HumanEval, HumanEval+, MBPP, MBPP+ and MultiPL-E, which underscore the effectiveness of Instruction Fusion in advancing the capabilities of LLMs in code generation.

Co-authors

Yu Xu 1

Venues

COLING1
Findings1

Fix author