David Lo
2026
SecureVibeBench: Benchmarking Secure Vibe Coding of AI Agents via Reconstructing Vulnerability-Introducing Scenarios
Junkai Chen | Huihui Huang | Yunbo Lyu | Junwen An | Jieke Shi | Chengran Yang | Ting Zhang | Haoye Tian | Yikun Li | Zhenhao Li | Xin Zhou | Xing Hu | David Lo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Junkai Chen | Huihui Huang | Yunbo Lyu | Junwen An | Jieke Shi | Chengran Yang | Ting Zhang | Haoye Tian | Yikun Li | Zhenhao Li | Xin Zhou | Xing Hu | David Lo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language model-powered code agents are rapidly transforming software engineering, yet the security risks of their generated code have become a critical concern. Existing benchmarks have provided valuable insights, but they fail to capture scenarios in which vulnerabilities are actually introduced by human developers, making fair comparisons between humans and agents infeasible. We therefore introduce SecureVibeBench, a benchmark of 105 C/C++ secure coding tasks sourced from 41 projects in OSS-Fuzz for code agents. SecureVibeBench has the following features: (i) realistic task settings that require multi-file edits in large repositories, (ii) aligned contexts based on real-world open-source vulnerabilities with precisely identified vulnerability introduction points, and (iii) comprehensive evaluation that combines functionality testing and security checking with both static and dynamic oracles. We evaluate 5 popular code agents like OpenHands, supported by 5 LLMs (e.g., Claude sonnet 4.5) on SecureVibeBench. Results show that current agents struggle to produce both correct and secure code, as even the best-performing one, produces merely 23.8% correct and secure solutions on SecureVibeBench.
MultiCodeAttack: Iterative Jailbreak Attacking on LLMs with Multi-Code Prompt Injection
Weifeng Sun | Meng Yan | Zhou Yang | Yuchen Chen | Song Sun | David Lo
Findings of the Association for Computational Linguistics: ACL 2026
Weifeng Sun | Meng Yan | Zhou Yang | Yuchen Chen | Song Sun | David Lo
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) demonstrate strong generalization capabilities but remain vulnerable to jailbreak attacks that induce restricted text or malicious code generation.Recent structured jailbreaks embed adversarial intent into code-like templates and have demonstrated promising effectiveness.However, existing approaches typically operate within a fixed template design and a single programming language, without considering language diversity or adaptive template evolution, thereby limiting the exploration of cross-language jailbreak behaviors.In this paper, we present MultiCodeAttack, a structured jailbreak framework that systematically explores and optimizes multi-language code templates.MultiCodeAttack maintains a diverse template library across programming languages, dynamically selects languages with higher attack effectiveness via a multi-armed bandit strategy, and evolves templates through semantic-preserving mutation guided by response-aware signals.Extensive experiments on 8 LLMs show that MultiCodeAttack outperforms existing jailbreak baselines, achieving 28.23%–832.59% higher harmful text generation.On malicious code generation across 11 LLMs, MultiCodeAttack produces up to 136.22% more malicious outputs than the baseline methods.Our code is available at https://anonymous.4open.science/r/MultiCodeAttack/.
SeCuRepair: Semantics-Aligned, Curriculum-Driven, and Reasoning-Enhanced Vulnerability Repair Framework
Chengran Yang | Ting Zhang | Jinfeng Jiang | Xin Zhou | Haoye Tian | Mingzhe Du | Jieke Shi | Junkai Chen | Yikun Li | Eng Lieh Ouh | Lwin Khin Shar | David Lo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chengran Yang | Ting Zhang | Jinfeng Jiang | Xin Zhou | Haoye Tian | Mingzhe Du | Jieke Shi | Junkai Chen | Yikun Li | Eng Lieh Ouh | Lwin Khin Shar | David Lo
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The rapid accumulation of software vulnerabilities has outpaced manual remediation, creating an urgent need for Automated Vulnerability Repair (AVR). However, existing methods suffer from syntactic overfitting, mimicking surface forms without understanding the underlying repair logic, and fail to generalize to complex fixes. To transcend these limitations, we propose SeCuRepair, a reliable, scalable, and efficient RL-based AVR framework. By introducing a semantic-aware reward, SeCuRepair optimizes for code semantic equivalence rather than lexical mimicry. Furthermore, SeCuRepair incorporates an expert-aligned reasoning mechanism that explicitly grounds patch generation in a structured diagnosis. Finally, SeCuRepair introduces a difficulty-based curriculum that progressively disentangles the optimization barriers of entangled multi-hunk repairs. Extensive evaluations on a rigorous repository-level split show that SeCuRepair substantially outperforms state-of-the-art baselines, as confirmed by both automatic evaluation and human study.
2025
TACLR: A Scalable and Efficient Retrieval-based Method for Industrial Product Attribute Value Identification
Yindu Su | Huike Zou | Lin Sun | Ting Zhang | Haiyang Yang | Chen Li Yu | David Lo | Qingheng Zhang | Shuguang Han | Jufeng Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yindu Su | Huike Zou | Lin Sun | Ting Zhang | Haiyang Yang | Chen Li Yu | David Lo | Qingheng Zhang | Shuguang Han | Jufeng Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Product Attribute Value Identification (PAVI) involves identifying attribute values from product profiles, a key task for improving product search, recommendation, and business analytics on e-commerce platforms.However, existing PAVI methods face critical challenges, such as inferring implicit values, handling out-of-distribution (OOD) values, and producing normalized outputs.To address these limitations, we introduce Taxonomy-Aware Contrastive Learning Retrieval (TACLR), the first retrieval-based method for PAVI.TACLR formulates PAVI as an information retrieval task by encoding product profiles and candidate values into embeddings and retrieving values based on their similarity. It leverages contrastive training with taxonomy-aware hard negative sampling and employs adaptive inference with dynamic thresholds.TACLR offers three key advantages: (1) it effectively handles implicit and OOD values while producing normalized outputs; (2) it scales to thousands of categories, tens of thousands of attributes, and millions of values; and (3) it supports efficient inference for high-load industrial deployment.Extensive experiments on proprietary and public datasets validate the effectiveness and efficiency of TACLR. Further, it has been successfully deployed on the real-world e-commerce platform Xianyu, processing millions of product listings daily with frequently updated, large-scale attribute taxonomies. We release the code to facilitate reproducibility and future research at https://github.com/SuYindu/TACLR.
2009
Search
Fix author
Co-authors
- Junkai Chen 2
- Yikun Li 2
- Jieke Shi 2
- Haoye Tian 2
- Chengran Yang 2
- Ting Zhang 2
- Xin Zhou 2
- Junwen An 1
- Jufeng Chen 1
- Yuchen Chen 1
- Mingzhe Du 1
- Shuguang Han 1
- Xing Hu 1
- Huihui Huang 1
- Jing Jiang 1
- Jinfeng Jiang 1
- Zhenhao Li 1
- Yunbo Lyu 1
- Hong Mei 1
- Eng Lieh Ouh 1
- Lwin Khin Shar 1
- Yindu Su 1
- Lin Sun 1
- Weifeng Sun 1
- Song Sun 1
- Xiaoyin Wang 1
- Meng Yan 1
- Haiyang Yang 1
- Zhou Yang 1
- Chen Li Yu 1
- Ting Zhang 1
- Qingheng Zhang 1
- Lu Zhang 1
- Huike Zou 1