Ziwei Wang

Papers on this page may belong to the following people: Ziwei Wang, Ziwei Wang

2026

Knowledge Tracing (KT) is a pivotal task in personalized education, aiming to predict students’ future performance based on their historical interactions. While prior work has focused on learning behavioral sequences using question IDs or surface-level textual features, these methods often fail to capture complex behavioral patterns due to a lack of deep reasoning capabilities and world knowledge. To address this, we propose LLM-KT, a novel framework that integrates the reasoning power of Large Language Models (LLMs) with the sequential modeling strengths of traditional KT methods via multi-level plug-and-play alignment. Specifically, for task-level alignment, we design a plug-and-play instruction to leverage the rich knowledge and reasoning capacity of LLMs for the KT objective. For modality-level alignment, we introduce two mechanisms to integrate representations learned by traditional methods: (1) a Semantic History Projector that flexibly inserts compressed context embeddings into LLMs using question- and concept-specific tokens to capture long-term history; and (2) a Behavioral Dynamics Projector that enhances LLMs with sequential interaction patterns via a sequence adapter. Extensive experiments on four standard datasets demonstrate that LLM-KT achieves state-of-the-art performance, significantly outperforming over 20 competitive baselines.

2025

pdf bib abs

Stealthy data poisoning during fine-tuning can backdoor large language models (LLMs), threatening downstream safety. Existing detectors either use classifier-style probability signals—ill-suited to generation—or rely on rewriting, which can degrade quality and even introduce new triggers. We address the practical need to efficiently remove poisoned examples before or during fine-tuning. We observe a robust signal in the response space: after applying TF-IDF to model responses, poisoned examples form compact clusters (driven by consistent malicious outputs), while clean examples remain dispersed. We leverage this with RFTC—Reference-Filtration + TF-IDF Clustering. RFTC first compares each example’s response with that of a reference model and flags those with large deviations as suspicious; it then performs TF-IDF clustering on the suspicious set and identifies true poisoned examples using intra-class distance. On two machine translation datasets and one QA dataset, RFTC outperforms prior detectors in both detection accuracy and the downstream performance of the fine-tuned models. Ablations with different reference models further validate the effectiveness and robustness of Reference-Filtration.

pdf bib abs

In competitive programming task, problem statements are often embedded within elaborate narrative backgrounds, requiring deep understanding of the underlying solutions to successfully complete the tasks. Current code generation models primarily focus on token-level semantic modeling, highly susceptible to distractions from irrelevant narrative statements. Inspired by RAG, retrieving reference code with similar solutions may help enhance model performance on difficult problems. However, existing retrieval models also emphasize surface-level semantic similarity, neglecting the deeper solution-level logical similarities that are critical in competitive programming. Therefore, designing ranking models capable of accurately identifying and retrieving problems and corresponding codes remains an urgent research problem in competitive code generation. In this paper, we propose SolveRank, a solution-aware ranking model empowered by synthetic data for competitive programming tasks. Specifically, we leverage the DeepSeek-R1 model to generate logically equivalent but differently phrased new problems, verified by GPT-4o for solution consistency. Then, we train SolveRank with these as positive samples and BM25/random-retrieved problems as negatives. During inference, SolveRank retrieves relevant problems and corresponding code from the corpus to assist a downstream code generator. Experiments on the xCodeEval dataset demonstrate that SolveRank outperforms SOTA ranking methods in precision and recall metrics, and boosts code generation performance for difficult problems.

Co-authors

Fei Sun 1

Venues

Findings3

Fix author