JaeSung Lee

Also published as: Jae-Sung Lee, Jae Sung Lee, Jaesung Lee

2025

pdf bib abs
BitAbuse: A Dataset of Visually Perturbed Texts for Defending Phishing Attacks
Hanyong Lee | Chaelyn Lee | Yongjae Lee | Jaesung Lee
Findings of the Association for Computational Linguistics: NAACL 2025

Phishing often targets victims through visually perturbed texts to bypass security systems. The noise contained in these texts functions as an adversarial attack, designed to deceive language models and hinder their ability to accurately interpret the content. However, since it is difficult to obtain sufficient phishing cases, previous studies have used synthetic datasets that do not contain real-world cases. In this study, we propose the BitAbuse dataset, which includes real-world phishing cases, to address the limitations of previous research. Our dataset comprises a total of 325,580 visually perturbed texts. The dataset inputs are drawn from the raw corpus, consisting of visually perturbed sentences and sentences generated through an artificial perturbation process. Each input sentence is labeled with its corresponding ground truth, representing the restored, non-perturbed version. Language models trained on our proposed dataset demonstrated significantly better performance compared to previous methods, achieving an accuracy of approximately 96%. Our analysis revealed a significant gap between real-world and synthetic examples, underscoring the value of our dataset for building reliable pre-trained models for restoration tasks. We release the BitAbuse dataset, which includes real-world phishing cases annotated with visual perturbations, to support future research in adversarial attack defense.

2020

pdf bib abs
BERT-based Spatial Information Extraction
Hyeong Jin Shin | Jeong Yeon Park | Dae Bum Yuk | Jae Sung Lee
Proceedings of the Third International Workshop on Spatial Language Understanding

Spatial information extraction is essential to understand geographical information in text. This task is largely divided to two subtasks: spatial element extraction and spatial relation extraction. In this paper, we utilize BERT (Devlin et al., 2018), which is very effective for many natural language processing applications. We propose a BERT-based spatial information extraction model, which uses BERT for spatial element extraction and R-BERT (Wu and He, 2019) for spatial relation extraction. The model was evaluated with the SemEval 2015 dataset. The result showed a 15.4% point increase in spatial element extraction and an 8.2% point increase in spatial relation extraction in comparison to the baseline model (Nichols and Botros, 2015).

2019

pdf bib abs
CBNU System for SIGMORPHON 2019 Shared Task 2: a Pipeline Model
Uygun Shadikhodjaev | Jae Sung Lee
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

In this paper we describe our system for morphological analysis and lemmatization in context, using a transformer-based sequence to sequence model and a biaffine attention based BiLSTM model. First, a lemma is produced for a given word, and then both the lemma and the given word are used for morphological analysis. We also make use of character level word encodings and trainable encodings to improve accuracy. Overall, our system ranked fifth in lemmatization and sixth in morphological accuracy among twelve systems, and demonstrated considerable improvements over the baseline in morphological analysis.

2016

pdf bib abs
Extracting Spatial Entities and Relations in Korean Text
Bogyum Kim | Jae Sung Lee
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

A spatial information extraction system retrieves spatial entities and their relationships for geological searches and reasoning. Spatial information systems have been developed mainly for English text, e.g., through the SpaceEval competition. Some of the techniques are useful but not directly applicable to Korean text, because of linguistic differences and the lack of language resources. In this paper, we propose a Korean spatial entity extraction model and a spatial relation extraction model; the spatial entity extraction model uses word vectors to alleviate the over generation and the spatial relation extraction mod-el uses dependency parse labels to find the proper arguments in relations. Experiments with Korean text show that the two models are effective for spatial information extraction.

JaeSung Lee

2025

2020

2019

2016

1996

Co-authors

Venues