ChatGPT as a Java Decompiler

Bradley Mcdanel, Zhanhao Liu


Abstract
We propose a novel approach using instruction-tuned large language models (LLMs), such as ChatGPT, to automatically decompile entire Java classes. Our method relies only on a textual representation of the Java bytecode and corresponding unit tests generated from the bytecode. While no additional domain knowledge or fine-tuning is performed, we provide a single training example of this decompilation process in the model’s prompt. To overcome both compilation errors and test failures, we use an iterative prompting approach. We find that ChatGPT-4 is able to generate more human-readable output than existing software-based decompilers while achieving slightly lower pass rates on unit tests. Source code and datasets are available at https://github.com/BradMcDanel/gpt-java-decompiler.
Anthology ID:
2023.gem-1.19
Volume:
Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Sebastian Gehrmann, Alex Wang, João Sedoc, Elizabeth Clark, Kaustubh Dhole, Khyathi Raghavi Chandu, Enrico Santus, Hooman Sedghamiz
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
224–232
Language:
URL:
https://aclanthology.org/2023.gem-1.19
DOI:
Bibkey:
Cite (ACL):
Bradley Mcdanel and Zhanhao Liu. 2023. ChatGPT as a Java Decompiler. In Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 224–232, Singapore. Association for Computational Linguistics.
Cite (Informal):
ChatGPT as a Java Decompiler (Mcdanel & Liu, GEM-WS 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-5/2023.gem-1.19.pdf