Songwen Gong
2025
Fine-Grained Features-based Code Search for Precise Query-Code Matching
Xinting Zhang
|
Mengqiu Cheng
|
Mengzhen Wang
|
Songwen Gong
|
Jiayuan Xie
|
Yi Cai
|
Qing Li
Proceedings of the 31st International Conference on Computational Linguistics
Code search aims to quickly locate target code snippets from databases using natural language queries, which promotes code reusability. Existing methods can effectively obtain aligned token-level and query word-level features. However, these studies usually represent the semantics of code and query by averaging the features of each token and word respectively, which makes it difficult to accurately capture the code details that are closely related to the query. To address this issue, we propose a fine-grained code search model that consists of a cross-modal encoder, a mapping layer, and a classification layer. Specifically, we utilize a pre-trained model, GraphCodeBERT, in the cross-modal encoder to align features. In the mapping layer, we introduce a co-attention network to capture the fine-grained interactions between code and query, ensuring a model can precisely identify key code segments relevant to the query. Finally, in the classification layer, we incorporate instruction learning techniques that leverage contextual reasoning to improve the accuracy of query-code matching. Experimental results show that our proposed model significantly outperforms existing methods across multiple programming language datasets.
Sequence Structure Aware Retriever for Procedural Document Retrieval: A New Dataset and Baseline
Zhenqi Ye
|
HaoPeng Ren
|
Yi Cai
|
Qingbao Huang
|
Jing Qin
|
Pinli Zhu
|
Songwen Gong
Findings of the Association for Computational Linguistics: EMNLP 2025
Execution failures are common in daily life when individuals perform procedural tasks, such as cooking or handicrafts making. Retrieving relevant procedural documents that align closely with both the content of steps and the overall execution sequence can help correct these failures with fewer modifications. However, existing retrieval methods, which primarily focus on declarative knowledge, often neglect the execution sequence structures inherent in procedural documents. To tackle this challenge, we introduce a new dataset Procedural Questions, and propose a retrieval model Graph-Fusion Procedural Document Retriever (GFPDR) which integrates procedural graphs with document representations. Extensive experiments demonstrate the effectiveness of GFPDR, highlighting its superior performance in procedural document retrieval compared to existing models.