JaeSung Lee
Also published as: Jae-Sung Lee, Jaesung Lee
2025
BitAbuse: A Dataset of Visually Perturbed Texts for Defending Phishing Attacks
Hanyong Lee
|
Chaelyn Lee
|
Yongjae Lee
|
Jaesung Lee
Findings of the Association for Computational Linguistics: NAACL 2025
Phishing often targets victims through visually perturbed texts to bypass security systems. The noise contained in these texts functions as an adversarial attack, designed to deceive language models and hinder their ability to accurately interpret the content. However, since it is difficult to obtain sufficient phishing cases, previous studies have used synthetic datasets that do not contain real-world cases. In this study, we propose the BitAbuse dataset, which includes real-world phishing cases, to address the limitations of previous research. Our dataset comprises a total of 325,580 visually perturbed texts. The dataset inputs are drawn from the raw corpus, consisting of visually perturbed sentences and sentences generated through an artificial perturbation process. Each input sentence is labeled with its corresponding ground truth, representing the restored, non-perturbed version. Language models trained on our proposed dataset demonstrated significantly better performance compared to previous methods, achieving an accuracy of approximately 96%. Our analysis revealed a significant gap between real-world and synthetic examples, underscoring the value of our dataset for building reliable pre-trained models for restoration tasks. We release the BitAbuse dataset, which includes real-world phishing cases annotated with visual perturbations, to support future research in adversarial attack defense.
1996
A Logical Structure for the Construction of Machine Readable Dictionaries
Byung-Jin Choi
|
Jae-Sung Lee
|
Woon-Jae Lee
|
Key-Sun Choi
Proceedings of the 11th Pacific Asia Conference on Language, Information and Computation
Search
Fix data
Co-authors
- Byung-Jin Choi 1
- Key-Sun Choi 1
- Hanyong Lee 1
- Chaelyn Lee 1
- Yongjae Lee 1
- show all...