Cheng Wen
2026
Formally Specifying the Intended Behavior of the Program: LLM-Driven Neuro-Symbolic Program Specification Synthesis
Cheng Wen | Hu Junjie | YiKun Hu | Jie Su | Bin Yu | Dugang Liu | Zhiwu Xu | Weidi Sun | Shengchao Qin | Cong Tian
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Cheng Wen | Hu Junjie | YiKun Hu | Jie Su | Bin Yu | Dugang Liu | Zhiwu Xu | Weidi Sun | Shengchao Qin | Cong Tian
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Formal verification can provide strong mathematical guarantees about software correctness, but it typically requires developers to write detailed formal specifications (e.g., contracts and loop invariants), which is costly and error-prone. We introduce AutoSpec+, an LLM-driven neuro-symbolic demonstration system that reframes specification writing as constrained structured synthesis: large language models generate candidate specifications at the granularity of proof-relevant program components, while a symbolic verifier acts as a deterministic critic that checks legality, satisfiability, and proof adequacy, rejecting or repairing candidates in an iterative loop. This design turns unconstrained text generation into constrained structured synthesis, substantially reducing hallucinations and producing proof-ready annotations. We evaluate AutoSpec+ on seven benchmark suites, showing strong effectiveness. We release an open-source, Dockerized system with ensemble LLM backends and inter-modular verification support for reproducible demonstration and deployment
Bridging Kernel Drivers and Virtual Device Models with LLM-Powered Automation
Mingyu Wang | Bin Yu | Wenjian Lu | Zhi Wang | Gao Kefeng | Cheng Wen | Xu Lu | Cong Tian
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Mingyu Wang | Bin Yu | Wenjian Lu | Zhi Wang | Gao Kefeng | Cheng Wen | Xu Lu | Cong Tian
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Linux kernel device drivers are tightly coupled with hardware, making them difficult to execute and test without physical devices. This heavily limits automated code analysis and vulnerability discovery. While manual modeling is unscalable, Large Language Models (LLMs) offer a new approach to scale virtual device construction across the Linux driver ecosystem. In this paper, we present DevGen, an LLM-powered tool that generates QEMU-based virtual devices directly from Linux driver source code. DevGen combines static analysis to gather necessary context, guides the LLM through step-by-step prompting, and uses an automated self-correction loop driven by compilation and execution feedback. To further reduce errors, similar fixes are retrieved from a library of common modeling failures and incorporated into the repair prompt, which supports more targeted corrections in later iterations. The generated devices finally integrate with QEMU and Syzkaller, enabling driver fuzzing without physical hardware. DevGen is evaluated on 50 PCI/PCIe drivers from Linux 6.18 using three mainstream LLMs, and successfully generates usable models for 44 drivers. In these drivers, 24% of them achieve significant improvements in fuzzing coverage, and 7 previously unknown crashes are triggered with 1 CVE assigned. These results demonstrate the practical capability of LLMs to automate complex, system-level code generation tasks.
2025
From Informal to Formal – Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs
Jialun Cao | Yaojie Lu | Meiziniu Li | Haoyang Ma | Haokun Li | Mengda He | Cheng Wen | Le Sun | Hongyu Zhang | Shengchao Qin | Shing-Chi Cheung | Cong Tian
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jialun Cao | Yaojie Lu | Meiziniu Li | Haoyang Ma | Haokun Li | Mengda He | Cheng Wen | Le Sun | Hongyu Zhang | Shengchao Qin | Shing-Chi Cheung | Cong Tian
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The research in AI-based formal mathematical reasoning has shown an unstoppable growth trend. These studies have excelled in mathematical competitions like IMO and have made significant progress. However, these studies intertwined multiple skills simultaneously—problem-solving, reasoning, and writing formal specifications—making it hard to precisely identify the LLMs’ strengths and weaknesses in each task. This paper focuses on formal verification, an immediate application scenario of formal reasoning, and breaks it down into sub-tasks. We constructed 18k high-quality instruction-response pairs across five mainstream formal specification languages (Coq, Lean4, Dafny, ACSL, and TLA+) in six tasks by distilling gpt-4o and evaluated against ten open-sourced LLMs, including recent popular DeepSeek-R1. We found that LLMs are good at writing proof segments when given either the code, or the detailed description of proof steps. Also, the fine-tuning brought about a nearly threefold improvement at most. And interestingly, we observed that fine-tuning with formal data also enhances abilities in mathematics, reasoning, and coding. We hope our findings inspire further research.