DIRECT : A Transformer-based Model for Decompiled Identifier Renaming

Vikram Nitin; Anthony Saieva; Baishakhi Ray; Gail Kaiser

doi:10.18653/v1/2021.nlp4prog-1.6

DIRECT : A Transformer-based Model for Decompiled Identifier Renaming

Vikram Nitin, Anthony Saieva, Baishakhi Ray, Gail Kaiser

Abstract

Decompiling binary executables to high-level code is an important step in reverse engineering scenarios, such as malware analysis and legacy code maintenance. However, the generated high-level code is difficult to understand since the original variable names are lost. In this paper, we leverage transformer models to reconstruct the original variable names from decompiled code. Inherent differences between code and natural language present certain challenges in applying conventional transformer-based architectures to variable name recovery. We propose DIRECT, a novel transformer-based architecture customized specifically for the task at hand. We evaluate our model on a dataset of decompiled functions and find that DIRECT outperforms the previous state-of-the-art model by up to 20%. We also present ablation studies evaluating the impact of each of our modifications. We make the source code of DIRECT available to encourage reproducible research.

Anthology ID:: 2021.nlp4prog-1.6
Volume:: Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021)
Month:: August
Year:: 2021
Address:: Online
Editors:: Royi Lachmy, Ziyu Yao, Greg Durrett, Milos Gligoric, Junyi Jessy Li, Ray Mooney, Graham Neubig, Yu Su, Huan Sun, Reut Tsarfaty
Venue:: NLP4Prog
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 48–57
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2021.nlp4prog-1.6/
DOI:: 10.18653/v1/2021.nlp4prog-1.6
Bibkey:
Cite (ACL):: Vikram Nitin, Anthony Saieva, Baishakhi Ray, and Gail Kaiser. 2021. DIRECT : A Transformer-based Model for Decompiled Identifier Renaming. In Proceedings of the 1st Workshop on Natural Language Processing for Programming (NLP4Prog 2021), pages 48–57, Online. Association for Computational Linguistics.
Cite (Informal):: DIRECT : A Transformer-based Model for Decompiled Identifier Renaming (Nitin et al., NLP4Prog 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2021.nlp4prog-1.6.pdf

PDF Cite Search Fix data