Shellcode_IA32: A Dataset for Automatic Shellcode Generation
Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella, Bojan Cukic, Samira Shaikh
Abstract
We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task.- Anthology ID:
- 2021.nlp4prog-1.7
- Volume:
- Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021)
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Editors:
- Royi Lachmy, Ziyu Yao, Greg Durrett, Milos Gligoric, Junyi Jessy Li, Ray Mooney, Graham Neubig, Yu Su, Huan Sun, Reut Tsarfaty
- Venue:
- NLP4Prog
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 58–64
- Language:
- URL:
- https://aclanthology.org/2021.nlp4prog-1.7
- DOI:
- 10.18653/v1/2021.nlp4prog-1.7
- Cite (ACL):
- Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella, Bojan Cukic, and Samira Shaikh. 2021. Shellcode_IA32: A Dataset for Automatic Shellcode Generation. In Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021), pages 58–64, Online. Association for Computational Linguistics.
- Cite (Informal):
- Shellcode_IA32: A Dataset for Automatic Shellcode Generation (Liguori et al., NLP4Prog 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2021.nlp4prog-1.7.pdf
- Code
- dessertlab/Shellcode_IA32
- Data
- Shellcode_IA32