Ling Li

2026

The problem of surface-level pattern mapping represents a critical yet underexplored failure mode in large language model (LLM) reasoning, and is particularly acute in cross-architecture code migration of high-performance libraries. On low-resource, low-level code, insufficient coverage in pretraining data often leads LLMs to rely on superficial name- or type-based correspondences, rather than principled refactorization and reasoning grounded in core functional semantics and architecture-specific optimization intents. This tendency severely hampers the effectiveness of LLMs in complex migration scenarios.To address these challenges, we propose FSCM, a multi-agent framework for cross-architecture migration. FSCM decouples complex implementation details through functional mining and code refactoring, guiding LLMs to focus on invariant semantic anchors across architectures. By mitigating surface-level pattern traps, FSCM improves both functional correctness and performance when targeting emerging architectures. Extensive experiments on the challenging real-world OpenCV library migration tasks demonstrate substantial improvements over state-of-the-art baselines, achieving up to 22% higher correctness rates over Copilot and 43.04x speedup on RISC-V platforms. Code and data are available at: https://anonymous.4open.science/r/code-F8D4.

pdf bib abs

Advances in Multimodal Large Language Models (MLLMs) intensify concerns about data safety, making Machine Unlearning (MU), the selective removal of harmful/private information, a critical necessity. However, existing MU benchmarks for MLLMs are limited by a lack of image diversity, coarse-grained unlearning target, and insufficient evaluation scenarios, which fail to capture the complexity of real-world applications. To facilitate the development of MLLMs unlearning and alleviate the aforementioned limitations, we introduce OFFSIDE, a novel benchmark for evaluating misinformation unlearning in MLLMs. This manually curated dataset contains 15.68K records for 80 players, providing a comprehensive framework with four test sets to assess forgetting efficacy, generalization, utility, and robustness. OFFSIDE supports advanced unlearning targets, such as fine-grained unlearning and visual rumor removal. Our extensive evaluation of multiple baselines not only extends key findings from LLM MU to MLLM MU: (1) unlearned rumors can be easily recovered through relearning and (2) all methods are vulnerable to prompt attacks, but also introduces novel insights in the context of MLLM: (1) unimodal methods fail to handle multimodal rumors, (2) unlearning efficacy is primarily driven by catastrophic forgetting statistically, and (3) all methods struggle with visual rumors (rumors embedded in images). These results expose significant vulnerabilities in current approaches, highlighting the need for more robust multimodal unlearning solutions.

2025

pdf bib abs

The attention operator remains a critical performance bottleneck in large language models (LLMs), particularly for long-context scenarios. While FlashAttention is the most widely used and effective GPU-aware acceleration algorithm, it must require time-consuming and hardware-specific manual implementation, limiting adaptability across GPU architectures. Existing LLMs have shown a lot of promise in code generation tasks, but struggle to generate high-performance attention code. The key challenge is it cannot comprehend the complex data flow and computation process of the attention operator and utilize low-level primitive to exploit GPU performance.To address the above challenge, we propose an LLM-friendly Thinking Language (LLM-TL) to help LLMs decouple the generation of high-level optimization logic and low-level implementation on GPU, and enhance LLMs’ understanding of attention operator.Along with a 2-stage reasoning workflow, TL-Code generation and translation, the LLMs can automatically generate FlashAttention implementation on diverse GPUs, establishing a self-optimizing paradigm for generating high-performance attention operators in attention-centric algorithms.Verified on A100, RTX8000, and T4 GPUs, the performance of our methods significantly outshines that of vanilla LLMs, achieving a speed-up of up to 35.16×.Besides, our method not only surpasses human-optimized libraries (cuDNN and official library) in most scenarios but also extends support to unsupported hardware and data types, reducing development time from months to minutes compared with human experts.

The discovery of novel mechanical metamaterials, whose properties are dominated by their engineered structures rather than chemical composition, is a knowledge-intensive and resource-demanding process. To accelerate the design of novel metamaterials, we present MetaScientist, a human-in-the-loop system that integrates advanced AI capabilities with expert oversight with two primary phases: (1) hypothesis generation, where the system performs complex reasoning to generate novel and scientifically sound hypotheses, supported with domain-specific foundation models and inductive biases retrieved from existing literature; (2) 3D structure synthesis, where a 3D structure is synthesized with a novel 3D diffusion model based on the textual hypothesis and refined it with a LLM-based refinement model to achieve better structure properties. At each phase, domain experts iteratively validate the system outputs, and provide feedback and supplementary materials to ensure the alignment of the outputs with scientific principles and human preferences. Through extensive evaluation from human scientists, MetaScientist is able to deliver novel and valid mechanical metamaterial designs that have the potential to be highly impactful in the metamaterial field.