Lingyue Fu
2026
CoreCodeBench: Decoupling Code Intelligence via Fine-Grained Repository-Level Tasks
Lingyue Fu | Hao Guan | Bolun Zhang | Haowei Yuan | Yaoming Zhu | Lin Qiu | ZongYu Wang | Xuezhi Cao | Xunliang Cai | Weiwen Liu | Weinan Zhang | Yong Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lingyue Fu | Hao Guan | Bolun Zhang | Haowei Yuan | Yaoming Zhu | Lin Qiu | ZongYu Wang | Xuezhi Cao | Xunliang Cai | Weiwen Liu | Weinan Zhang | Yong Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The evaluation of Large Language Models (LLMs) for software engineering has shifted towards complex, repository-level tasks. However, existing benchmarks predominantly rely on coarse-grained pass rates that treat programming proficiency as a monolithic capability, obscuring specific cognitive bottlenecks. Furthermore, the static nature of these benchmarks renders them vulnerable to data contamination and performance saturation. To address these limitations, we introduce CoreCodeBench, a configurable repository-level benchmark designed to dissect coding capabilities through atomized tasks. Leveraging our automated framework, CorePipe, we extract and transform Python repositories into a comprehensive suite of tasks that isolate distinct cognitive demands within identical code contexts. Unlike static evaluations, CoreCodeBench supports controllable difficulty scaling to prevent saturation and ensures superior data quality. It achieves a 78.55% validity yield, significantly surpassing the 31.7% retention rate of SWE-bench-Verified. Extensive experiments with state-of-the-art LLMs reveal a significant capability misalignment, evidenced by distinct ranking shifts across cognitive dimensions. This indicates that coding proficiency is non-monolithic, as strength in one aspect does not necessarily translate to others. These findings underscore the necessity of our fine-grained taxonomy in diagnosing model deficiencies and offer a sustainable, rigorous framework for evolving code intelligence. Code of CorePipe framework and data of CoreCodeBench are available in https://github.com/AGI-Eval-Official/CoreCodeBench and https://huggingface.co/collections/tubehhh/corecodebench.
2025
Train Once for All: A Transitional Approach for Efficient Aspect Sentiment Triplet Extraction
Xinmeng Hou | Lingyue Fu | Chenhao Meng | Kounianhua Du | Hai Hu
Findings of the Association for Computational Linguistics: EMNLP 2025
Xinmeng Hou | Lingyue Fu | Chenhao Meng | Kounianhua Du | Hai Hu
Findings of the Association for Computational Linguistics: EMNLP 2025
Aspect-Opinion Pair Extraction (AOPE) and Aspect Sentiment Triplet Extraction (ASTE) have drawn growing attention in NLP. However, most existing approaches extract aspects and opinions independently, optionally adding pairwise relations, often leading to error propagation and high time complexity. To address these challenges and being inspired by transition-based dependency parsing, we propose the first transition-based model for AOPE and ASTE that performs aspect and opinion extraction jointly, which also better captures position-aware aspect-opinion relations and mitigates entity-level bias. By integrating contrastive-augmented optimization, our model delivers more accurate action predictions and jointly optimizes separate subtasks in linear time. Extensive experiments on four commonly used ASTE/AOPE datasets show that, our proposed transition-based model outperform previous models on two out of the four datasets when trained on a single dataset. When multiple training sets are used, our proposed method achieves new state-of-the-art results on all datasets. We show that this is partly due to our model’s ability to benefit from transition actions learned from multiple datasets and domains.Our code is available at https://github.com/Paparare/trans_aste.