Abstract
We propose a novel approach using instruction-tuned large language models (LLMs), such as ChatGPT, to automatically decompile entire Java classes. Our method relies only on a textual representation of the Java bytecode and corresponding unit tests generated from the bytecode. While no additional domain knowledge or fine-tuning is performed, we provide a single training example of this decompilation process in the model’s prompt. To overcome both compilation errors and test failures, we use an iterative prompting approach. We find that ChatGPT-4 is able to generate more human-readable output than existing software-based decompilers while achieving slightly lower pass rates on unit tests. Source code and datasets are available at https://github.com/BradMcDanel/gpt-java-decompiler.- Anthology ID:
- 2023.gem-1.19
- Volume:
- Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Sebastian Gehrmann, Alex Wang, João Sedoc, Elizabeth Clark, Kaustubh Dhole, Khyathi Raghavi Chandu, Enrico Santus, Hooman Sedghamiz
- Venues:
- GEM | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 224–232
- Language:
- URL:
- https://aclanthology.org/2023.gem-1.19
- DOI:
- Cite (ACL):
- Bradley Mcdanel and Zhanhao Liu. 2023. ChatGPT as a Java Decompiler. In Proceedings of the Third Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 224–232, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- ChatGPT as a Java Decompiler (Mcdanel & Liu, GEM-WS 2023)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2023.gem-1.19.pdf