Ziming Zhao

2026

Large language models have recently advanced automated program repair, yet most existing approaches provide only post-hoc natural-language explanations that are neither executable nor verifiable. This limitation is especially critical for quantum programs, where correctness hinges on subtle semantic properties such as circuit equivalence and fidelity preservation. We propose Explainable Quantum Program Repair, a framework that couples repair generation with machine-checkable executable explanations. Given a buggy quantum circuit, a language model proposes candidate repairs together with structured transformation rationales, which are compiled into proof traces and validated using formal verification backends, including circuit equivalence checking, ZX-calculus reasoning, stabilizer analysis, and quantum simulation. Only repairs whose explanations are fully verified are accepted. Experiments on QASMBench with mutation-generated quantum program bugs demonstrate that our approach achieves competitive repair success while substantially improving semantic precision and explanation faithfulness over baselines that rely on unconstrained or purely natural-language explanations.

The increasing adoption of large language models (LLMs) in software engineering necessitates rigorous security evaluation of their generated code. However, existing benchmarks often lack relevance to real-world AI-assisted programming scenarios, making them inadequate for assessing the practical security risks associated with AI-generated code in production environments. To address this gap, we introduce A.S.E (AI Code Generation Security Evaluation), a repository-level evaluation benchmark designed to closely mirror real-world AI programming tasks, offering a comprehensive and reliable framework for assessing the security of AI-generated code. Our evaluation of leading LLMs on A.S.E reveals several key findings. In particular, current LLMs still struggle with secure coding. The complexity in repository-level scenarios presents challenges for LLMs that typically perform well on snippet-level tasks. Moreover, a larger reasoning budget does not necessarily lead to better code generation. These observations offer valuable insights into the current state of AI code generation and help developers identify the most suitable models for practical tasks. They also lay the groundwork for refining LLMs to generate secure and efficient code in real-world applications.

pdf bib abs

QUARTZ: Quantile-Aware Routing and Queueing for TTFT SLOs in LLM Serving
Zhipeng Liu | Yifan Zheng | Fanqi Kong | Ziming Zhao
Findings of the Association for Computational Linguistics: ACL 2026

Large Language Model (LLM) serving systems increasingly face strict time-to-first-token (TTFT) service-level objectives (SLOs), yet TTFT remains highly sensitive to router-side queueing effects. Prefill costs scale with prompt length, decode lengths are uncertain, and prefix locality creates strong performance skew across requests. Despite major advances in continuous batching and KV-cache management, today’s routers are often agnostic to request cost, which makes them vulnerable to head-of-line blocking and tail-latency amplification under mixed workloads. We propose QUARTZ, a quantile-aware routing and queueing layer for LLM serving that predicts conservative quantile-based request-cost proxies, rather than point estimates, using lightweight router-visible signals. QUARTZ uses these quantiles together with backlog-aware router signals to guide worker selection and admission decisions that better align with TTFT tail SLOs while preserving fairness. We implement QUARTZ as a router upgrade for SGLang and evaluate it on representative interactive and retrieval-augmented workloads. The results show reductions in TTFT tail latency and SLO violations across heterogeneous workloads.