Towards Compositional Generalization in Code Search

Hojae Han, Seung-won Hwang, Shuai Lu, Nan Duan, Seungtaek Choi


Abstract
We study compositional generalization, which aims to generalize on unseen combinations of seen structural elements, for code search. Unlike existing approaches of partially pursuing this goal, we study how to extract structural elements, which we name a template that directly targets compositional generalization. Thus we propose CTBERT, or Code Template BERT, representing codes using automatically extracted templates as building blocks. We empirically validate CTBERT on two public code search benchmarks, AdvTest and CSN. Further, we show that templates are complementary to data flow graphs in GraphCodeBERT, by enhancing structural context around variables.
Anthology ID:
2022.emnlp-main.737
Volume:
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates
Editors:
Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10743–10750
Language:
URL:
https://aclanthology.org/2022.emnlp-main.737
DOI:
10.18653/v1/2022.emnlp-main.737
Bibkey:
Cite (ACL):
Hojae Han, Seung-won Hwang, Shuai Lu, Nan Duan, and Seungtaek Choi. 2022. Towards Compositional Generalization in Code Search. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10743–10750, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):
Towards Compositional Generalization in Code Search (Han et al., EMNLP 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2022.emnlp-main.737.pdf
Software:
 2022.emnlp-main.737.software.zip
Dataset:
 2022.emnlp-main.737.dataset.zip