Yuanbo Wen
2026
QiMeng-PRepair: Precise Code Repair via Edit-Aware Reward Optimization
Changxin Ke | Rui Zhang | Jiaming Guo | Yuanbo Wen | Li Ding | Shuo Wang | Xuyuan Zhu | Xiong Peng | Di Huang | Zidong Du | Xing Hu | Qi Guo | Yunji Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Changxin Ke | Rui Zhang | Jiaming Guo | Yuanbo Wen | Li Ding | Shuo Wang | Xuyuan Zhu | Xiong Peng | Di Huang | Zidong Du | Xing Hu | Qi Guo | Yunji Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) achieve strong program repair performance but often suffer from over-editing, where excessive modifications overwrite correct code and hinder bug localization. We systematically quantify its impact and introduce precise repair task, which maximizes reuse of correct code while fixing only buggy parts. Building on this insight, we propose PRepair, a framework that mitigates over-editing and improves repair accuracy. PRepair has two components: Self-Breaking, which generates diverse buggy programs via controlled bug injection and min–max sampling, and Self-Repairing, which trains models with Edit-Aware Group Relative Policy Optimization (EA-GRPO) using an edit-aware reward to encourage minimal yet correct edits. Experiments show that PRepair improves repair precision by up to 31.4% under fix1@1, a metric that jointly considers repair correctness and extent, and significantly increases decoding throughput when combined with speculative editing, demonstrating its potential for precise and practical code repair.
2025
QiMeng-Attention: SOTA Attention Operator is generated by SOTA Attention Algorithm
Qirui Zhou | Shaohui Peng | Weiqiang Xiong | Haixin Chen | Yuanbo Wen | Haochen Li | Ling Li | Qi Guo | Yongwei Zhao | Ke Gao | Ruizhi Chen | Yanjun Wu | Zhao Chen | Yunji Chen
Findings of the Association for Computational Linguistics: ACL 2025
Qirui Zhou | Shaohui Peng | Weiqiang Xiong | Haixin Chen | Yuanbo Wen | Haochen Li | Ling Li | Qi Guo | Yongwei Zhao | Ke Gao | Ruizhi Chen | Yanjun Wu | Zhao Chen | Yunji Chen
Findings of the Association for Computational Linguistics: ACL 2025
The attention operator remains a critical performance bottleneck in large language models (LLMs), particularly for long-context scenarios. While FlashAttention is the most widely used and effective GPU-aware acceleration algorithm, it must require time-consuming and hardware-specific manual implementation, limiting adaptability across GPU architectures. Existing LLMs have shown a lot of promise in code generation tasks, but struggle to generate high-performance attention code. The key challenge is it cannot comprehend the complex data flow and computation process of the attention operator and utilize low-level primitive to exploit GPU performance.To address the above challenge, we propose an LLM-friendly Thinking Language (LLM-TL) to help LLMs decouple the generation of high-level optimization logic and low-level implementation on GPU, and enhance LLMs’ understanding of attention operator.Along with a 2-stage reasoning workflow, TL-Code generation and translation, the LLMs can automatically generate FlashAttention implementation on diverse GPUs, establishing a self-optimizing paradigm for generating high-performance attention operators in attention-centric algorithms.Verified on A100, RTX8000, and T4 GPUs, the performance of our methods significantly outshines that of vanilla LLMs, achieving a speed-up of up to 35.16×.Besides, our method not only surpasses human-optimized libraries (cuDNN and official library) in most scenarios but also extends support to unsupported hardware and data types, reducing development time from months to minutes compared with human experts.