Jiongchi Yu

2026

Large language models have recently advanced automated program repair, yet most existing approaches provide only post-hoc natural-language explanations that are neither executable nor verifiable. This limitation is especially critical for quantum programs, where correctness hinges on subtle semantic properties such as circuit equivalence and fidelity preservation. We propose Explainable Quantum Program Repair, a framework that couples repair generation with machine-checkable executable explanations. Given a buggy quantum circuit, a language model proposes candidate repairs together with structured transformation rationales, which are compiled into proof traces and validated using formal verification backends, including circuit equivalence checking, ZX-calculus reasoning, stabilizer analysis, and quantum simulation. Only repairs whose explanations are fully verified are accepted. Experiments on QASMBench with mutation-generated quantum program bugs demonstrate that our approach achieves competitive repair success while substantially improving semantic precision and explanation faithfulness over baselines that rely on unconstrained or purely natural-language explanations.

The increasing adoption of large language models (LLMs) in software engineering necessitates rigorous security evaluation of their generated code. However, existing benchmarks often lack relevance to real-world AI-assisted programming scenarios, making them inadequate for assessing the practical security risks associated with AI-generated code in production environments. To address this gap, we introduce A.S.E (AI Code Generation Security Evaluation), a repository-level evaluation benchmark designed to closely mirror real-world AI programming tasks, offering a comprehensive and reliable framework for assessing the security of AI-generated code. Our evaluation of leading LLMs on A.S.E reveals several key findings. In particular, current LLMs still struggle with secure coding. The complexity in repository-level scenarios presents challenges for LLMs that typically perform well on snippet-level tasks. Moreover, a larger reasoning budget does not necessarily lead to better code generation. These observations offer valuable insights into the current state of AI code generation and help developers identify the most suitable models for practical tasks. They also lay the groundwork for refining LLMs to generate secure and efficient code in real-world applications.