Nihal Jain
2025
LibEvolutionEval: A Benchmark and Study for Version-Specific Code Generation
Sachit Kuhar
|
Wasi Uddin Ahmad
|
Zijian Wang
|
Nihal Jain
|
Haifeng Qian
|
Baishakhi Ray
|
Murali Krishna Ramanathan
|
Xiaofei Ma
|
Anoop Deoras
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Recent advancements in code completion models have primarily focused on local file contexts. However, these studies do not fully capture the complexity of real-world software development, which often requires the use of rapidly-evolving public libraries. To address this gap, we introduce LibEvolutionEval, a comprehensive study that emphasizes the need to understand library evolution to perform accurate in-line code completions. LibEvolutionEvaloffers a version-specific code-completion task across eight libraries as they evolve over the years, along with an in-depth analysis of the evolution of two widely used and well-maintained public libraries: PyTorch and Matplotlib. We evaluate several popular models and find that public library evolution significantly affects their performance. To mitigate this, we explored how retrieving version-specific library documentation and prompt-based techniques can enhance model capability in dealing with these fast-evolving packages. This suggests a promising path forward for better handling fast-evolving libraries. Our tasks will be made publicly available upon acceptance.
2023
ContraCLM: Contrastive Learning For Causal Language Model
Nihal Jain
|
Dejiao Zhang
|
Wasi Uddin Ahmad
|
Zijian Wang
|
Feng Nan
|
Xiaopeng Li
|
Ming Tan
|
Ramesh Nallapati
|
Baishakhi Ray
|
Parminder Bhatia
|
Xiaofei Ma
|
Bing Xiang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Despite exciting progress in causal language models, the expressiveness of their representations is largely limited due to poor discrimination ability. To remedy this issue, we present CONTRACLM, a novel contrastive learning framework at both the token-level and the sequence-level. We assess CONTRACLM on a variety of downstream tasks. We show that CONTRACLM enhances the discrimination of representations and bridges the gap with encoder-only models, which makes causal language models better suited for tasks beyond language generation. Specifically, we attain 44% relative improvement on the Semantic Textual Similarity tasks and 34% on Code-to-Code Search tasks. Furthermore, by improving the expressiveness of representations, CONTRACLM also boosts the source code generation capability with 9% relative improvement on execution accuracy on the HumanEval benchmark.
Search
Fix data
Co-authors
- Wasi Ahmad 2
- Xiaofei Ma 2
- Baishakhi Ray 2
- Zijian Wang 2
- Parminder Bhatia 1
- show all...