Yongbin Jeong
2026
Towards Safer Calls for Everyone: Designing a Benchmark Dataset for Evaluating Voice Phishing Detection Models
Joeun Kang | Gyuri Choi | Chanhyuk Yoon | Yongbin Jeong | Younggyun Hahm | Shea Husband | Hansaem Kim
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Joeun Kang | Gyuri Choi | Chanhyuk Yoon | Yongbin Jeong | Younggyun Hahm | Shea Husband | Hansaem Kim
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Voice phishing is an evolving form of social engineering crime and requires the continuous advancement of detection technologies. We introduce a benchmark dataset designed to evaluate the practical performance of AI-based voice phishing detection models. The dataset includes diverse voice conversation scenarios and supports four evaluation tasks to assess open-source language models. Experimental results show that while some large-scale models demonstrate stable performance across multiple tasks, accuracy remains low in topic classification and dialogue structure recognition, regardless of model size. These findings highlight the complexity of voice phishing detection, which demands contextual reasoning and dialogue structure understanding beyond simple sentence-level comprehension. The proposed benchmark dataset provides a foundation for more robust evaluation and development of AI systems capable of detecting deceptive voice interactions, contributing to safer and more trustworthy communication environments
2024
Optimizing Language Augmentation for Multilingual Large Language Models: A Case Study on Korean
ChangSu Choi | Yongbin Jeong | Seoyoon Park | Inho Won | HyeonSeok Lim | SangMin Kim | Yejee Kang | Chanhyuk Yoon | Jaewan Park | Yiseul Lee | HyeJin Lee | Younggyun Hahm | Hansaem Kim | KyungTae Lim
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
ChangSu Choi | Yongbin Jeong | Seoyoon Park | Inho Won | HyeonSeok Lim | SangMin Kim | Yejee Kang | Chanhyuk Yoon | Jaewan Park | Yiseul Lee | HyeJin Lee | Younggyun Hahm | Hansaem Kim | KyungTae Lim
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Large language models (LLMs) use pretraining to predict the subsequent word; however, their expansion requires significant computing resources. Numerous big tech companies and research institutes have developed multilingual LLMs (MLLMs) to meet current demands, overlooking less-resourced languages (LRLs). This study proposed three strategies to enhance the performance of LRLs based on the publicly available MLLMs. First, the MLLM vocabularies of LRLs were expanded to enhance expressiveness. Second, bilingual data were used for pretraining to align the high- and less-resourced languages. Third, a high-quality small-scale instruction dataset was constructed and instruction-tuning was performed to augment the LRL. The experiments employed the Llama2 model and Korean was used as the LRL, which was quantitatively evaluated against other developed LLMs across eight tasks. Furthermore, a qualitative assessment was performed based on human evaluation and GPT4. Experimental results showed that our proposed Bllossom model exhibited superior performance in qualitative analyses compared to previously proposed Korean monolingual models.
2023
Teddysum at MEDIQA-Chat 2023: an analysis of fine-tuning strategy for long dialog summarization
Yongbin Jeong | Ju-Hyuck Han | Kyung Min Chae | Yousang Cho | Hyunbin Seo | KyungTae Lim | Key-Sun Choi | Younggyun Hahm
Proceedings of the 5th Clinical Natural Language Processing Workshop
Yongbin Jeong | Ju-Hyuck Han | Kyung Min Chae | Yousang Cho | Hyunbin Seo | KyungTae Lim | Key-Sun Choi | Younggyun Hahm
Proceedings of the 5th Clinical Natural Language Processing Workshop
In this paper, we introduce the design and various attempts for TaskB of MEDIQA-Chat 2023. The goal of TaskB in MEDIQA-Chat 2023 is to generate full clinical note from doctor-patient consultation dialogues. This task has several challenging issues, such as lack of training data, handling long dialogue inputs, and generating semi-structured clinical note which have section heads. To address these issues, we conducted various experiments and analyzed their results. We utilized the DialogLED model pre-trained on long dialogue data to handle long inputs, and we pre-trained on other dialogue datasets to address the lack of training data. We also attempted methods such as using prompts and contrastive learning for handling sections. This paper provides insights into clinical note generation through analyzing experimental methods and results, and it suggests future research directions.
2020
Enhancing Quality of Corpus Annotation: Construction of the Multi-Layer Corpus Annotation and Simplified Validation of the Corpus Annotation
Youngbin Noh | Kuntae Kim | Minho Lee | Cheolhun Heo | Yongbin Jeong | Yoosung Jeong | Younggyun Hahm | Taehwan Oh | Hyonsu Choe | Seokwon Park | Jin-Dong Kim | Key-Sun Choi
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
Youngbin Noh | Kuntae Kim | Minho Lee | Cheolhun Heo | Yongbin Jeong | Yoosung Jeong | Younggyun Hahm | Taehwan Oh | Hyonsu Choe | Seokwon Park | Jin-Dong Kim | Key-Sun Choi
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation
Search
Fix author
Co-authors
- Younggyun Hahm 4
- Key-Sun Choi 2
- Hansaem Kim 2
- KyungTae Lim 2
- Chanhyuk Yoon 2
- Kyung Min Chae 1
- Yousang Cho 1
- Hyonsu Choe 1
- ChangSu Choi 1
- Gyuri Choi 1
- Ju-Hyuck Han 1
- Cheolhun Heo 1
- Shea Husband 1
- Yoosung Jeong 1
- Yejee Kang 1
- Joeun Kang 1
- SangMin Kim 1
- Kuntae Kim 1
- Jin-Dong Kim 1
- Yiseul Lee 1
- HyeJin Lee 1
- Minho Lee 1
- Hyeonseok Lim 1
- Youngbin Noh 1
- Taehwan Oh 1
- Seoyoon Park 1
- Jaewan Park 1
- Seokwon Park 1
- Hyunbin Seo 1
- Inho Won 1