2025
pdf
bib
abs
Q-FAKER: Query-free Hard Black-box Attack via Controlled Generation
CheolWon Na
|
YunSeok Choi
|
Jee-Hyong Lee
Findings of the Association for Computational Linguistics: NAACL 2025
Many adversarial attack approaches are proposed to verify the vulnerability of language models. However, they require numerous queries and the information on the target model. Even black-box attack methods also require the target model’s output information. They are not applicable in real-world scenarios, as in hard black-box settings where the target model is closed and inaccessible. Even the recently proposed hard black-box attacks still require many queries and demand extremely high costs for training adversarial generators. To address these challenges, we propose Q-faker (Query-free Hard Black-box Attacker), a novel and efficient method that generates adversarial examples without accessing the target model. To avoid accessing the target model, we use a surrogate model instead. The surrogate model generates adversarial sentences for a target-agnostic attack. During this process, we leverage controlled generation techniques. We evaluate our proposed method on eight datasets. Experimental results demonstrate our method’s effectiveness including high transferability and the high quality of the generated adversarial examples, and prove its practical in hard black-box settings.
pdf
bib
abs
CoRAC: Integrating Selective API Document Retrieval with Question Semantic Intent for Code Question Answering
YunSeok Choi
|
CheolWon Na
|
Jee-Hyong Lee
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Automatic code question answering aims to generate precise answers to questions about code by analyzing code snippets. To provide an appropriate answer, it is necessary to accurately understand the relevant part of the code and correctly interpret the intent of the question. However, in real-world scenarios, the questioner often provides only a portion of the code along with the question, making it challenging to find an answer. The responder should be capable of providing a suitable answer using such limited information. We propose a knowledge-based framework, CoRAC, an automatic code question responder that enhances understanding through selective API document retrieval and question semantic intent clustering. We evaluate our method on three real-world benchmark datasets and demonstrate its effectiveness through various experiments. We also show that our method can generate high-quality answers compared to large language models, such as ChatGPT.
2023
pdf
bib
abs
DIP: Dead code Insertion based Black-box Attack for Programming Language Model
CheolWon Na
|
YunSeok Choi
|
Jee-Hyong Lee
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Automatic processing of source code, such as code clone detection and software vulnerability detection, is very helpful to software engineers. Large pre-trained Programming Language (PL) models (such as CodeBERT, GraphCodeBERT, CodeT5, etc.), show very powerful performance on these tasks. However, these PL models are vulnerable to adversarial examples that are generated with slight perturbation. Unlike natural language, an adversarial example of code must be semantic-preserving and compilable. Due to the requirements, it is hard to directly apply the existing attack methods for natural language models. In this paper, we propose DIP (Dead code Insertion based Black-box Attack for Programming Language Model), a high-performance and effective black-box attack method to generate adversarial examples using dead code insertion. We evaluate our proposed method on 9 victim downstream-task large code models. Our method outperforms the state-of-the-art black-box attack in both attack efficiency and attack quality, while generated adversarial examples are compiled preserving semantic functionality.
2021
pdf
bib
Learning Sequential and Structural Information for Source Code Summarization
YunSeok Choi
|
JinYeong Bak
|
CheolWon Na
|
Jee-Hyong Lee
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021