Abstract
To mitigate the annual financial losses caused by SMS phishing (smishing) in South Korea, we propose an explainable smishing detection framework that adapts to a Korean-centric large language model (LLM). Our framework not only classifies smishing attempts but also provides clear explanations, enabling users to identify and understand these threats. This end-to-end solution encompasses data collection, pseudo-label generation, and parameter-efficient task adaptation for models with fewer than five billion parameters. Our approach achieves a 15% improvement in accuracy over GPT-4 and generates high-quality explanatory text, as validated by seven automatic metrics and qualitative evaluation, including human assessments.- Anthology ID:
- 2024.emnlp-industry.47
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, US
- Editors:
- Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 642–656
- Language:
- URL:
- https://aclanthology.org/2024.emnlp-industry.47
- DOI:
- 10.18653/v1/2024.emnlp-industry.47
- Cite (ACL):
- Yunseung Lee and Daehee Han. 2024. KorSmishing Explainer: A Korean-centric LLM-based Framework for Smishing Detection and Explanation Generation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 642–656, Miami, Florida, US. Association for Computational Linguistics.
- Cite (Informal):
- KorSmishing Explainer: A Korean-centric LLM-based Framework for Smishing Detection and Explanation Generation (Lee & Han, EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.emnlp-industry.47.pdf