Zilei Wang


2026

The dominant Fill-in-the-Middle (FIM) paradigm for code completion is constrained by its rigid inability to correct contextual errors and reliance on unaligned, insecure Base models. While Chat LLMs offer safety and Agentic workflows provide flexibility, they suffer from performance degradation and prohibitive latency, respectively. To resolve this dilemma, we propose Search-and-Replace Infilling (SRI), a framework that internalizes the agentic verification-and-editing mechanism into a unified, single-pass inference process. By structurally grounding edits via an explicit search phase, SRI harmonizes completion tasks with the instruction-following priors of Chat LLMs, extending the paradigm from static infilling to dynamic context-aware editing. We synthesize a high-quality dataset, SRI-200K, and fine-tune the SRI-Coder series. Extensive evaluations demonstrate that with minimal data (20k samples), SRI-Coder enables Chat models to surpass the completion performance of their Base counterparts. Crucially, unlike FIM-style tuning, SRI preserves general coding competencies and maintains inference latency comparable to standard FIM. We release our dataset and models, establishing SRI as a robust, secure, and efficient alignment recipe for next-generation interactive development.
Vision-Language Models (VLMs) have demonstrated impressive capabilities in code generation across various domains. However, their ability to replicate complex, multi-panel visualizations from real-world data remains largely unassessed. To address this gap, we introduce RealChart2Code, a new large-scale benchmark with over 2,800 instances grounded in authentic datasets and featuring tasks with clear analytical intent. Crucially, it is the first benchmark to systematically evaluate chart generation from large-scale raw data and assess iterative code refinement in a multi-turn conversational setting. Our comprehensive evaluation of 14 leading VLMs on RealChart2Code reveals significant performance degradation compared to simpler benchmarks, highlighting their struggles with complex plot structures and authentic data. Our analysis uncovers a substantial performance gap between proprietary and open-weight models and confirms that even state-of-the-art VLMs often fail to accurately replicate intricate, multi-panel charts. These findings provide valuable insights into the current limitations of VLMs and guide future research directions.

2025

Large language models (LLMs) augmented with retrieval systems have significantly advanced natural language processing tasks by integrating external knowledge sources, enabling more accurate and contextually rich responses. To improve the robustness of such systems against noisy retrievals, Retrieval-Augmented Fine-Tuning (RAFT) has emerged as a widely adopted method. However, RAFT conditions models to generate answers even in the absence of reliable knowledge. This behavior undermines their reliability in high-stakes domains, where acknowledging uncertainty is critical. To address this issue, we propose Divide-Then-Align (DTA), a post-training approach designed to endow RAG systems with the ability to respond with “I don’t know” when the query is out of the knowledge boundary of both the retrieved passages and the model’s internal knowledge. DTA divides data samples into four knowledge quadrants and constructs tailored preference data for each quadrant, resulting in a curated dataset for Direct Preference Optimization (DPO). Experimental results on three benchmark datasets demonstrate that effectively balances accuracy with appropriate abstention, enhancing the reliability and trustworthiness of retrieval-augmented systems.

2023

Distantly supervised relation extraction (DSRE) aims to extract relational facts from texts but suffers from noisy instances. To mitigate the influence of noisy labels, current methods typically use the Multi-Instance-Learning framework to extract relations for each bag. However, these approaches are not capable of extracting relation labels for individual sentences. Several studies have focused on sentence-level DSRE to solve the above problem. These studies primarily aim to develop methods for identifying noisy samples and filtering them out to mitigate the impact of noise. However, discarding noisy samples directly leads to the loss of useful information. To this end, we propose SSLRE, a novel Semi-Supervised-Learning Relation Extraction framework for sentence-level DSRE. We discard only the labels of the noisy samples and utilize these instances without labels as unlabeled samples. Our SSLRE framework utilizes a weighted K-NN graph to select confident samples as labeled data and the rest as unlabeled. We then design a robust semi-supervised learning framework that can efficiently handle remaining label noise present in the labeled dataset, while also making effective use of unlabeled samples. Based on our experiments on two real-world datasets, the SSLRE framework we proposed has achieved significant enhancements in sentence-level relation extraction performance compared to the existing state-of-the-art methods. Moreover, it has also attained a state-of-the-art level of performance in bag-level relation extraction with ONE aggregation strategy.