Li Kuang
2026
VerilogLAVD: LLM-Aided Pattern Generation for Verilog CWE Detection
Xiang Long | Yingjie Xia | Li Kuang | Yao Wan | ZiHao Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiang Long | Yingjie Xia | Li Kuang | Yao Wan | ZiHao Liu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
LLMs often fail in hardware vulnerability detection due to the intrinsic semantic concurrency of HDLs (Hardware Description Language), where vulnerabilities arise from the interaction of multiple concurrent execution statements rather than a single sequential execution path. To address the problem, we propose VerilogLAVD, a LLM-Aided Vulnerability Detection framework by generating executable Traversal Detection Patterns (TDPs), i.e. the rules describing how to find the evidence of vulnerabilities in Verilog HDL. We first introduce a Unified Verilog Property Graph (VeriPG) that explicitly models parallel semantics by combining AST, CFG, and DDG. Furthermore, a semantic validation mechanism is designed to constrain and filter the LLM-generated TDPs. By executing these validated TDPs on VeriPG, our method produces stable and deterministic detection results. Experiments demonstrate that VerilogLAVD improves the F1 score by 133% compared to LLM-based methods. Furthermore, the framework successfully identifies real-world hardware vulnerabilities in open-source hardware design repositories.
CascadeFix: Multi-Location Program Repair via Cascading Planning and Generation
Huan Zhang | Li Kuang | Yang Yang | Yilei Fang | Yingjie Xia
Findings of the Association for Computational Linguistics: ACL 2026
Huan Zhang | Li Kuang | Yang Yang | Yilei Fang | Yingjie Xia
Findings of the Association for Computational Linguistics: ACL 2026
Automated Program Repair (APR) is vital for software maintenance. Despite notable advancements, existing methods still face challenges of insufficient bug dependency modeling and inadequate global repair planning when addressing semantically complex multi-location bugs. We propose CascadeFix, a multi-location automatic repair method via cascading planning and generation. Firstly, to improve the modeling of semantic and structural dependencies among bugs, three types of bug relationships-Use, Copy, and Nearby-are defined to characterize semantic connection, patch reusability, and contextual interference. Then, to address inadequate global repair planning, a cascading repair planning algorithm is designed to effectively cluster strongly correlated bugs and intelligently assign reasonable repair priorities and operations to each cluster, ensuring the rationality and consistency of global repair. Finally, taking clusters as the basic repair units, a cascading patch generation mechanism is proposed to dynamically integrate intra-cluster dependency information and cross-cluster repair knowledge, producing patches that maintain syntactic correctness and semantic consistency under global dependency constraints. Experiments on Defects4J show that CascadeFix resolves 84 multi-location bugs, achieving a 31% improvement over current state-of-the-art methods.
2025
Re3Syn: A Dependency-Based Data Synthesis Framework for Long-Context Post-training
Zhiyang Zhang | Ziqiang Liu | Huiming Wang | Renke Shan | Li Kuang | Lu Wang | De Wen Soh
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhiyang Zhang | Ziqiang Liu | Huiming Wang | Renke Shan | Li Kuang | Lu Wang | De Wen Soh
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
An important trend in the realm of large language models (LLMs) is the development of longer context windows. However, training LLMs with long context windows to acquire the capability of effectively modeling lengthy inputs is often hindered by the scarcity of naturally long-context data. Existing methods for constructing long-context data by concatenating short documents have overlooked a crucial characteristic of long-context data quality, namely semantic dependency. In this paper, we propose a novel framework called Retrieval, Dependency Recognition, and Reorder for data synthesis (Re3Syn), which leverages semantic similarity to retrieve relevant documents and form several batches. Within each batch, the framework comprehensively recognizes dependency and utilizes them, along with a reorder algorithm, to organize the short documents into coherent long-context data. Comprehensive experiment on multiple benchmarks indicate that the data generated by the Re3Syn has longer dependencies and significantly enhances the model’s long-context capabilities. For reproducibility, we will release our codebase upon acceptance.
CSTree-SRI: Introspection-Driven Cognitive Semantic Tree for Multi-Turn Question Answering over Extra-Long Contexts
Zhaowen Wang | Xiang Wei | Kangshao Du | Yiting Zhang | Libo Qin | Yingjie Xia | Li Kuang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhaowen Wang | Xiang Wei | Kangshao Du | Yiting Zhang | Libo Qin | Yingjie Xia | Li Kuang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have achieved remarkable success in natural language processing (NLP), particularly in single-turn question answering (QA) on short-text. However, their performance significantly declines when applied to multi-turn QA over extra-long context (ELC), as they struggle to capture the logical correlations across multiple chunks of ELC and maintain the coherence of multi-turn Questions. To address the challenges, we propose the CSTree-SRI framework (Cognitive Semantic Tree through Summarization, Retrieval, and Introspection). CSTree-SRI dynamically constructs the CSTree to preserve logical coherence within ELC through hierarchical synthesis and introspective validation. Then a logic-driven traversal strategy on CSTree is designed to provide efficient information retrieval for question answering. Additionally, we construct a suite of multi-turn QA datasets and an evaluation benchmark tailored for ELC tasks, and comprehensive experiments demonstrate the framework’s superiority in addressing the challenges of multi-turn QA over ELC.
2024
Breaking the Hourglass Phenomenon of Residual Quantization: Enhancing the Upper Bound of Generative Retrieval
Zhirui Kuai | Zuxu Chen | Huimu Wang | Mingming Li | Dadong Miao | Wang Binbin | Xusong Chen | Li Kuang | Yuxing Han | Jiaxing Wang | Guoyu Tang | Lin Liu | Songlin Wang | Jingwei Zhuo
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Zhirui Kuai | Zuxu Chen | Huimu Wang | Mingming Li | Dadong Miao | Wang Binbin | Xusong Chen | Li Kuang | Yuxing Han | Jiaxing Wang | Guoyu Tang | Lin Liu | Songlin Wang | Jingwei Zhuo
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Generative retrieval (GR) has emerged as a transformative paradigm in search and recommender systems, leveraging numeric-based identifier representations to enhance efficiency and generalization. Notably, methods like TIGER, which employ Residual Quantization-based Semantic Identifiers (RQ-SID), have shown significant promise in e-commerce scenarios by effectively managing item IDs. However, a critical issue termed the "Hourglass" phenomenon, occurs in RQ-SID, where intermediate codebook tokens become overly concentrated, hindering the full utilization of generative retrieval methods. This paper analyses and addresses this problem by identifying data sparsity and long-tailed distribution as the primary causes. Through comprehensive experiments and detailed ablation studies, we analyze the impact of these factors on codebook utilization and data distribution. Our findings reveal that the “Hourglass” phenomenon substantially impacts the performance of RQ-SID in generative retrieval. We propose effective solutions to mitigate this issue, thereby significantly enhancing the effectiveness of generative retrieval in real-world E-commerce applications.
Search
Fix author
Co-authors
- Yingjie Xia 3
- Wang Binbin 1
- Zuxu Chen 1
- Xusong Chen 1
- Kangshao Du 1
- Yilei Fang 1
- Yuxing Han 1
- Zhirui Kuai 1
- Mingming Li 1
- ZiHao Liu 1
- Ziqiang Liu 1
- Lin Liu 1
- Xiang Long 1
- Dadong Miao 1
- Libo Qin 1
- Renke Shan 1
- De Wen Soh 1
- Guoyu Tang 1
- Yao Wan 1
- Huiming Wang 1
- Lu Wang 1
- Zhaowen Wang 1
- Huimu Wang 1
- Jiaxing Wang 1
- Songlin Wang 1
- Xiang Wei 1
- Yang Yang 1
- Zhiyang Zhang 1
- Yiting Zhang 1
- Huan Zhang 1
- Jingwei Zhuo 1