Bin Yu
2026
Formally Specifying the Intended Behavior of the Program: LLM-Driven Neuro-Symbolic Program Specification Synthesis
Cheng Wen | Hu Junjie | YiKun Hu | Jie Su | Bin Yu | Dugang Liu | Zhiwu Xu | Weidi Sun | Shengchao Qin | Cong Tian
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Cheng Wen | Hu Junjie | YiKun Hu | Jie Su | Bin Yu | Dugang Liu | Zhiwu Xu | Weidi Sun | Shengchao Qin | Cong Tian
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Formal verification can provide strong mathematical guarantees about software correctness, but it typically requires developers to write detailed formal specifications (e.g., contracts and loop invariants), which is costly and error-prone. We introduce AutoSpec+, an LLM-driven neuro-symbolic demonstration system that reframes specification writing as constrained structured synthesis: large language models generate candidate specifications at the granularity of proof-relevant program components, while a symbolic verifier acts as a deterministic critic that checks legality, satisfiability, and proof adequacy, rejecting or repairing candidates in an iterative loop. This design turns unconstrained text generation into constrained structured synthesis, substantially reducing hallucinations and producing proof-ready annotations. We evaluate AutoSpec+ on seven benchmark suites, showing strong effectiveness. We release an open-source, Dockerized system with ensemble LLM backends and inter-modular verification support for reproducible demonstration and deployment
Bridging Kernel Drivers and Virtual Device Models with LLM-Powered Automation
Mingyu Wang | Bin Yu | Wenjian Lu | Zhi Wang | Gao Kefeng | Cheng Wen | Xu Lu | Cong Tian
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Mingyu Wang | Bin Yu | Wenjian Lu | Zhi Wang | Gao Kefeng | Cheng Wen | Xu Lu | Cong Tian
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Linux kernel device drivers are tightly coupled with hardware, making them difficult to execute and test without physical devices. This heavily limits automated code analysis and vulnerability discovery. While manual modeling is unscalable, Large Language Models (LLMs) offer a new approach to scale virtual device construction across the Linux driver ecosystem. In this paper, we present DevGen, an LLM-powered tool that generates QEMU-based virtual devices directly from Linux driver source code. DevGen combines static analysis to gather necessary context, guides the LLM through step-by-step prompting, and uses an automated self-correction loop driven by compilation and execution feedback. To further reduce errors, similar fixes are retrieved from a library of common modeling failures and incorporated into the repair prompt, which supports more targeted corrections in later iterations. The generated devices finally integrate with QEMU and Syzkaller, enabling driver fuzzing without physical hardware. DevGen is evaluated on 50 PCI/PCIe drivers from Linux 6.18 using three mainstream LLMs, and successfully generates usable models for 44 drivers. In these drivers, 24% of them achieve significant improvements in fuzzing coverage, and 7 previously unknown crashes are triggered with 1 CVE assigned. These results demonstrate the practical capability of LLMs to automate complex, system-level code generation tasks.
2025
Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning
Yanda Chen | Chandan Singh | Xiaodong Liu | Simiao Zuo | Bin Yu | He He | Jianfeng Gao
Proceedings of the 31st International Conference on Computational Linguistics
Yanda Chen | Chandan Singh | Xiaodong Liu | Simiao Zuo | Bin Yu | He He | Jianfeng Gao
Proceedings of the 31st International Conference on Computational Linguistics
Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may explain “all birds can fly” when answering the question “Can sparrows fly?” but meanwhile answer “no” to the related question “Can penguins fly?”. Explanations should be consistent across related examples so that they allow humans to simulate the LLM’s decision process on multiple examples. We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples. EC-finetuning involves finetuning LLMs on synthetic data that is carefully constructed to contain consistent explanations. Across a variety of question-answering datasets in various domains, EC-finetuning yields a 10.0% relative explanation consistency improvement on 4 finetuning datasets, and generalizes to 7 out-of-distribution datasets not seen during finetuning (+4.5% relative). We will make our code available for reproducibility.