HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations
Minh Nguyen, Nghi Bui, Truong Son Hy, Long Tran-Thanh, Tien Nguyen
Abstract
Code representation is important to machine learning models in the code-related applications. Existing code summarization approaches primarily leverage Abstract Syntax Trees (ASTs) and sequential information from source code to generate code summaries while often overlooking the critical consideration of the interplay of dependencies among code elements and code hierarchy. However, effective summarization necessitates a holistic analysis of code snippets from three distinct aspects: lexical, syntactic, and semantic information. In this paper, we propose a novel code summarization approach utilizing Heterogeneous Code Representations (HCRs) and our specially designed HierarchyNet. HCRs adeptly capture essential code features at lexical, syntactic, and semantic levels within a hierarchical structure. HierarchyNet processes each layer of the HCR separately, employing a Heterogeneous Graph Transformer, a Tree-based CNN, and a Transformer Encoder. In addition, HierarchyNet demonstrates superior performance compared to fine-tuned pre-trained models, including CodeT5, and CodeBERT, as well as large language models that employ zero/few-shot settings, such as CodeLlama, StarCoder, and CodeGen. Implementation details can be found at https://github.com/FSoft-AI4Code/HierarchyNet.- Anthology ID:
- 2024.findings-eacl.156
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2024
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian’s, Malta
- Editors:
- Yvette Graham, Matthew Purver
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2355–2367
- Language:
- URL:
- https://aclanthology.org/2024.findings-eacl.156
- DOI:
- Cite (ACL):
- Minh Nguyen, Nghi Bui, Truong Son Hy, Long Tran-Thanh, and Tien Nguyen. 2024. HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations. In Findings of the Association for Computational Linguistics: EACL 2024, pages 2355–2367, St. Julian’s, Malta. Association for Computational Linguistics.
- Cite (Informal):
- HierarchyNet: Learning to Summarize Source Code with Heterogeneous Representations (Nguyen et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2024.findings-eacl.156.pdf