CLeVeR: Multi-modal Contrastive Learning for Vulnerability Code Representation

Jiayuan Li, Lei Cui, Sen Zhao, Yun Yang, Lun Li, Hongsong Zhu


Abstract
Automated vulnerability detection has become increasingly important. Many existing methods utilize deep learning models to obtain code representations for vulnerability detection. However, these approaches predominantly capture the overall semantics of the code rather than its intrinsic vulnerability-specific semantics. To address this issue, we propose CLeVeR, the first approach that leverages contrastive learning to generate precise vulnerability code representations under the supervision of vulnerability descriptions. Specifically, we introduce an Adapter, a Representation Refinement module, and a Description Simulator to mitigate the challenges of semantic misalignment and imbalance between code and descriptions, and input data inconsistency between pre-training and fine-tuning stages, respectively. For vulnerability detection and classification tasks, CLeVeR achieves F1 scores of 72.82% (real-world dataset) and 80.34%, outperforming state-of-the-art methods (SOTAs) by 11.85% and 13.61%. Additionally, CLeVeR also outperforms SOTAs in zero-shot inference, demonstrating the transferability of its generated vulnerability code representations.
Anthology ID:
2025.findings-acl.414
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
7940–7951
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.414/
DOI:
10.18653/v1/2025.findings-acl.414
Bibkey:
Cite (ACL):
Jiayuan Li, Lei Cui, Sen Zhao, Yun Yang, Lun Li, and Hongsong Zhu. 2025. CLeVeR: Multi-modal Contrastive Learning for Vulnerability Code Representation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 7940–7951, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
CLeVeR: Multi-modal Contrastive Learning for Vulnerability Code Representation (Li et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.findings-acl.414.pdf