Chanhyuk Yoon

2026

Voice phishing is an evolving form of social engineering crime and requires the continuous advancement of detection technologies. We introduce a benchmark dataset designed to evaluate the practical performance of AI-based voice phishing detection models. The dataset includes diverse voice conversation scenarios and supports four evaluation tasks to assess open-source language models. Experimental results show that while some large-scale models demonstrate stable performance across multiple tasks, accuracy remains low in topic classification and dialogue structure recognition, regardless of model size. These findings highlight the complexity of voice phishing detection, which demands contextual reasoning and dialogue structure understanding beyond simple sentence-level comprehension. The proposed benchmark dataset provides a foundation for more robust evaluation and development of AI systems capable of detecting deceptive voice interactions, contributing to safer and more trustworthy communication environments

2024

pdf bib abs

Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly available MLLMs. First, the MLLM vocabularies of LRLs were expanded to enhance expressiveness. Second, bilingual data were used for pretraining to align the high- and less-resourced languages. Third, a high-quality small-scale instruction dataset was constructed and instruction-tuning was performed to augment the LRL. The experiments employed the Llama2 model and Korean was used as the LRL, which was quantitatively evaluated against other developed LLMs across eight tasks. Furthermore, a qualitative assessment was performed based on human evaluation and GPT4. Experimental results showed that our proposed Bllossom model exhibited superior performance in qualitative analyses compared to previously proposed Korean monolingual models.

Co-authors

Venues

LREC2
COLING1

Fix author