Xiawu Zheng

2026

Factual knowledge stored in Large Language Models (LLMs) inevitably becomes outdated or erroneous over time, making it critical to update these models without incurring the high cost of retraining. Existing sequential knowledge editing methods predominantly rely on strict orthogonal projection to preserve previously edited knowledge. However, this excessive constraint limits gradient expressiveness, resulting in a significant degradation of model generalization and overall performance as the number of edits increases. To address this challenge, we propose Dual-Importance Projection Editing (DipEdit). This method leverages Singular Value Decomposition (SVD) to identify critical gradient subspaces and introduces a dual mechanism comprising "accumulated importance" and "projection importance." Unlike traditional approaches that enforce strict orthogonality, DipEdit dynamically scales gradient components parallel to key subspaces based on their projection importance rather than discarding them directly. This approach enhances the model’s adaptability to new knowledge while maximally preserving historical knowledge. Extensive experiments conducted on five mainstream LLMs using the ZsRE and Counterfact datasets demonstrate that DipEdit effectively handles thousands of sequential edits. The proposed method achieves an average comprehensive performance improvement of 10.36% and effectively maintains the model’s general capabilities on downstream tasks. Code is available at: https://github.com/czhhhla/DipEdit.

pdf bib abs

Algorithm Visualization (AV) helps students build mental models by animating algorithm execution states. Recent LLM-based systems such as CODE2VIDEO generate AV videos in an end-to-end manner. However, this paradigm requires the system to simultaneously simulate algorithm flow and satisfy video rendering constraints (element layout, color schemes, etc.), a complex task that induces LLM hallucinations. This results in reduced execution success rates, element overlap, and inter-frame inconsistencies.To address these challenges, we propose ALGOGEN, a novel paradigm that decouples algorithm execution from rendering. We first introduce Visualization Trace Algebra (VTA), a monoid over algorithm visual states and operations. The LLM then generates a Python tracker that simulates algorithm flow and outputs VTA-JSON traces, a JSON encoding of VTA. For rendering, we define a Rendering Style Language (RSL) to templatize algorithm layouts. A deterministic renderer then compiles algorithm traces with RSL into Manim, LaTeX/TikZ, or Three.js outputs[Manim, TikZ, and Three.js are respectively a Python animation engine, a LaTeX vector graphics package, and a JavaScript 3D rendering library.].Evaluated on a LeetCode AV benchmark of 200 tasks, ALGOGEN achieves an average success rate improvement of 17.3% compared to end-to-end methods (99.8% vs. 82.5%). These results demonstrate that our decoupling paradigm effectively mitigates LLM hallucinations in complex AV tasks, providing a more reliable solution for automated generation of high-quality algorithm visualizations. Demo videos and code are available at: .

2025

pdf bib abs

The Mixture of Experts (MoE) architecture enables efficient model scaling through conditional computation, where only subset of parameters are activated per input. However, this distributed architecture poses unprecedented challenges for model compression, as conventional quantization methods optimized for dense networks prove inadequate. This paper introduces a specialized quantization framework for MoE architectures, motivated by our discovery that weight matrices across expert networks exhibit distinctive channel-wise outlier distributions, necessitating a more nuanced compression approach. Through theoretical analysis incorporating Fisher Information matrices and condition number characteristics, we establish a fundamental relationship between layer functionality and quantization sensitivity, demonstrating that down-projection layers inherently demand higher precision compared to up-projection layers. Leveraging these insights, we develop an automated channel-wise quantization framework that dynamically determines optimal bit-width allocations while maintaining minimal computational overhead through efficient statistical approximations. When evaluated on the Mixtral-8x7b-v0.1 architecture, our methodology demonstrates a 3.96% improvement over existing state-of-the-art approaches across natural language understanding benchmarks, while achieving superior compression ratios.

Large Language Model (LLM)-based agents have excelled in various domains but face significant challenges when applied to data science workflows due to their complex, multi-stage nature. Current LLM-based agents struggle with non-linear relationships, recursive dependencies, implicit data- and logic-dependent reasoning, and managing extensive context. In this paper, we introduce Data Interpreter, an LLM-based agent that addresses these challenges through hierarchical graph-based modeling to represent the complexity and a progressive strategy for step-by-step verification, refinement, and consistent context management. Extensive experiments confirm the effectiveness of Data Interpreter. On InfiAgent-DABench, it boosts performance by 25% (from 75.9% to 94.9%), and on machine learning and open-ended tasks, it lifts accuracy from 88% to 95% and from 60% to 97%, respectively. Moreover, our method surpasses state-of-the-art baselines by 26% on the MATH dataset. We will release the code upon publication.

pdf bib abs

Large Language Models (LLMs) have demonstrated powerful performance in sequential recommendation due to their robust language modeling and comprehension capabilities. In such paradigms, the item texts of interaction sequences are formulated as sentences and LLMs are utilized to learn language representations or directly generate target item texts by incorporating instructions. Despite their promise, these methods solely focus on modeling the mapping from sequential texts to target items, neglecting the relationship between the items in an interaction sequence. This results in a failure to learn the transition patterns between items, which reflect the dynamic change in user preferences and are crucial for predicting the next item. To tackle this issue, we propose a novel framework for mapping the sequential item texts to the sequential item IDs, named ST2SI. Specifically, we first introduce multi-query input and item linear projection (ILP) to model the conditional probability distribution of items. Then, we further propose ID alignment to address misalignment between item texts and item IDs by instruction tuning. Finally, we propose efficient ILP tuning to adapt flexibly to different scenarios, requiring only training a linear layer to achieve competitive performance. Extensive experiments on six real-world datasets show our approach outperforms the best baselines by 7.33% in NDCG@10, 4.65% in Recall@10, and 8.42% in MRR.