Saide.saide@unilurio.ac.mz Saide.saide@unilurio.ac.mz
2025
MOZ-Smishing: A Benchmark Dataset for Detecting Mobile Money Frauds
Felermino D. M. A. Ali
|
Henrique Lopes Cardoso
|
Rui Sousa-Silva
|
Saide.saide@unilurio.ac.mz Saide.saide@unilurio.ac.mz
Proceedings of the Sixth Workshop on African Natural Language Processing (AfricaNLP 2025)
Despite the increasing prevalence of smishing attacks targeting Mobile Money Transfer systems, there is a notable lack of publicly available SMS phishing datasets in this domain. This study seeks to address this gap by creating a specialized dataset designed to detect smishing attacks aimed at Mobile Money Transfer users. The data set consists of crowd-sourced text messages from Mozambican mobile users, meticulously annotated into two categories: legitimate messages (ham) and fraudulent smishing attempts (spam). The messages are written in Portuguese, often incorporating microtext styles and linguistic nuances unique to the Mozambican context.We also investigate the effectiveness of LLMs in detecting smishing. Using in-context learning approaches, we evaluate the models’ ability to identify smishing attempts without requiring extensive task-specific training. The data set is released under an open license at the following link: huggingface-Anonymous