Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation

Xiangyu Zhang, Yu Zhou, Guang Yang, Wei Cheng, Taolue Chen


Abstract
The advent of large language models has significantly advanced automatic code generation, transforming the way programmers writing code. Inspired by natural language processing, mainstream code generation approaches represent code as a linear sequence of tokens. In this paper, we propose to represent code snippets as two-dimensional entities, where both code lines and tokens within lines are explicitly modeled. This representation allows us to capture the hierarchical and spatial structure of code, especially the dependencies between code lines. Our method CoDE introduces a dependency encoding approach that leverages dictionary learning to perform semantic matching between code lines. As such, it avoids the reliance on strict position indices, leading to better generalization to code with diverse context and lengths. We thoroughly evaluate CoDE based on four categories of tasks. The experimental results showcase its generalizability, context understanding and retrieval, as well as interpretability in code generation.
Anthology ID:
2025.acl-long.308
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6157–6172
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.308/
DOI:
Bibkey:
Cite (ACL):
Xiangyu Zhang, Yu Zhou, Guang Yang, Wei Cheng, and Taolue Chen. 2025. Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6157–6172, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Beyond Sequences: Two-dimensional Representation and Dependency Encoding for Code Generation (Zhang et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.308.pdf