Abstract
Cybersecurity risks such as malware threaten the personal safety of users, but to identify malware text is a major challenge. The paper proposes a supervised learning approach to identifying malware sentences given a document (subTask1 of SemEval 2018, Task 8), as well as to classifying malware tokens in the sentences (subTask2). The approach achieved good results, ranking second of twelve participants for both subtasks, with F-scores of 57% for subTask1 and 28% for subTask2.- Anthology ID:
- S18-1144
- Volume:
- Proceedings of the 12th International Workshop on Semantic Evaluation
- Month:
- June
- Year:
- 2018
- Address:
- New Orleans, Louisiana
- Editors:
- Marianna Apidianaki, Saif M. Mohammad, Jonathan May, Ekaterina Shutova, Steven Bethard, Marine Carpuat
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 890–893
- Language:
- URL:
- https://aclanthology.org/S18-1144
- DOI:
- 10.18653/v1/S18-1144
- Cite (ACL):
- Utpal Kumar Sikdar, Biswanath Barik, and Björn Gambäck. 2018. Flytxt_NTNU at SemEval-2018 Task 8: Identifying and Classifying Malware Text Using Conditional Random Fields and Naïve Bayes Classifiers. In Proceedings of the 12th International Workshop on Semantic Evaluation, pages 890–893, New Orleans, Louisiana. Association for Computational Linguistics.
- Cite (Informal):
- Flytxt_NTNU at SemEval-2018 Task 8: Identifying and Classifying Malware Text Using Conditional Random Fields and Naïve Bayes Classifiers (Sikdar et al., SemEval 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/S18-1144.pdf