Bin She


2026

Neural Processing Units (NPUs) are critical for AI infrastructure, yet developing kernels remains a bottleneck due to the complexity of vendor-specific Domain-Specific Languages (DSLs). While LLMs excel in general coding, they fail to meet the stringent constraints of NPU development, showing a near-zero success rate on complex kernels in our preliminary study. To address these challenges, we present AscendKernelGen, the first comprehensive framework for NPU kernel development, marking a pioneering effort in this field. This framework consists of three interconnected components: (1) Ascend-CoT, the first dataset in the NPU kernel domain that incorporates chain-of-thought reasoning from real-world kernel implementations; (2) KernelGen-LM, a domain-adaptive model trained on this novel dataset using supervised fine-tuning and reinforcement learning; and (3) NPUKernelBench, the first benchmark platform designed to evaluate the compilation, correctness, and performance of generated NPU kernels. Experimental results demonstrate that our approach dramatically bridges the gap in hardware-specific coding: compilation success on complex Level-2 kernels improves from 0% to 95.5% (Pass@10), with 64% functional correctness. AscendKernGen is available at AscendKernGen and NPUKernelBench.