Towards Compositional Generalization in Code Search
Hojae Han, Seung-won Hwang, Shuai Lu, Nan Duan, Seungtaek Choi
Abstract
We study compositional generalization, which aims to generalize on unseen combinations of seen structural elements, for code search. Unlike existing approaches of partially pursuing this goal, we study how to extract structural elements, which we name a template that directly targets compositional generalization. Thus we propose CTBERT, or Code Template BERT, representing codes using automatically extracted templates as building blocks. We empirically validate CTBERT on two public code search benchmarks, AdvTest and CSN. Further, we show that templates are complementary to data flow graphs in GraphCodeBERT, by enhancing structural context around variables.- Anthology ID:
- 2022.emnlp-main.737
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates
- Editors:
- Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 10743–10750
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-main.737
- DOI:
- 10.18653/v1/2022.emnlp-main.737
- Cite (ACL):
- Hojae Han, Seung-won Hwang, Shuai Lu, Nan Duan, and Seungtaek Choi. 2022. Towards Compositional Generalization in Code Search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10743–10750, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cite (Informal):
- Towards Compositional Generalization in Code Search (Han et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2022.emnlp-main.737.pdf