Semantic approaches to software component retrieval with English queries

Huijing Deng, Grzegorz Chrupała


Abstract
Enabling code reuse is an important goal in software engineering, and it depends crucially on effective code search interfaces. We propose to ground word meanings in source code and use such language-code mappings in order to enable a search engine for programming library code where users can pose queries in English. We exploit the fact that there are large programming language libraries which are documented both via formally specified function or method signatures as well as descriptions written in natural language. Automatically learned associations between words in descriptions and items in signatures allows us to use queries formulated in English to retrieve methods which are not documented via natural language descriptions, only based on their signatures. We show that the rankings returned by our model substantially outperforms a strong term-matching baseline.
Anthology ID:
L14-1042
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3248–3252
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/106_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Huijing Deng and Grzegorz Chrupała. 2014. Semantic approaches to software component retrieval with English queries. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3248–3252, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Semantic approaches to software component retrieval with English queries (Deng & Chrupała, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/106_Paper.pdf