Introducing a New Dataset for Event Detection in Cybersecurity Texts
Hieu Man Duc Trong, Duc Trong Le, Amir Pouran Ben Veyseh, Thuat Nguyen, Thien Huu Nguyen
Abstract
Detecting cybersecurity events is necessary to keep us informed about the fast growing number of such events reported in text. In this work, we focus on the task of event detection (ED) to identify event trigger words for the cybersecurity domain. In particular, to facilitate the future research, we introduce a new dataset for this problem, characterizing the manual annotation for 30 important cybersecurity event types and a large dataset size to develop deep learning models. Comparing to the prior datasets for this task, our dataset involves more event types and supports the modeling of document-level information to improve the performance. We perform extensive evaluation with the current state-of-the-art methods for ED on the proposed dataset. Our experiments reveal the challenges of cybersecurity ED and present many research opportunities in this area for the future work.- Anthology ID:
- 2020.emnlp-main.433
- Volume:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5381–5390
- Language:
- URL:
- https://aclanthology.org/2020.emnlp-main.433
- DOI:
- 10.18653/v1/2020.emnlp-main.433
- Cite (ACL):
- Hieu Man Duc Trong, Duc Trong Le, Amir Pouran Ben Veyseh, Thuat Nguyen, and Thien Huu Nguyen. 2020. Introducing a New Dataset for Event Detection in Cybersecurity Texts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5381–5390, Online. Association for Computational Linguistics.
- Cite (Informal):
- Introducing a New Dataset for Event Detection in Cybersecurity Texts (Man Duc Trong et al., EMNLP 2020)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2020.emnlp-main.433.pdf