Tianlong Wang


2026

Prompt-based in-context learning (ICL) and parameter fine-tuning are two dominant paradigms for incorporating external information into large language models (LLMs), but they incur high inference costs or require expensive retraining. To bridge this gap, context-to-parameter mapping converts prompts into temporary adapter weights. However, we identify a critical failure mode in existing methods: *hidden-state collapse*, where the adapter-augmented model’s internal states diverge sharply from the full-context oracle in deeper layers. We trace this failure to two coupled gaps: suboptimal **Input-Selection** and inadequate **Supervision-Signal**. To address these issues, we propose SADA (**S**tate-**A**ligned **D**istillation **A**dapters). We establish the *attention-block output* as a principled feature interface to improve input selection and introduce *state-alignment distillation* to enforce consistency between the adapter-augmented model and the full-context oracle. Experiments on long-context language modeling (PG19) and downstream NLU and summarization benchmarks show that SADA consistently outperforms strong baselines like *StreamAdapter* and *GenerativeAdapter*, achieving performance comparable to ICL while significantly reducing memory footprint and latency. We further analyze when parameterized context compression is effective and when explicit context retention remains preferable. Our code is available at [https://github.com/Taylor-Gavel/SADA.git](https://github.com/Taylor-Gavel/SADA.git).
Large Vision-Language Models (LVLMs) excel at visual understanding but face severe computational bottlenecks when processing high-resolution images and long videos due to massive visual token counts. Token pruning mitigates this by selectively removing less informative tokens while maintaining performance. However, existing methods vary widely in pruning location (vision encoder vs. LLM decoder), importance criteria (attention vs. similarity vs. learned scores), and application strategy, lacking systematic comparison. This survey presents the first comprehensive review of token pruning for LVLMs. We propose a taxonomy categorizing methods into vision-side, LLM-side, and hybrid paradigms, systematically analyze token selection mechanisms and pruning strategy. We further discuss evaluation protocols and identify key challenges including prompt-adaptive pruning and hardware-aware design. Our survey provides a structured foundation for this rapidly growing research area.

2025

Reinforcement learning (RL) on self-generated data has emerged as a promising paradigm for improving reasoning in large language models (LLMs). However, RL relies on accurate reward signals, which are scarce in many domains, making it critical to train models that can generalize to unseen problems. Existing methods often focus on task-specific or domain-specific reasoning, lacking consideration for generalization and may degrade performance on other tasks. To address this, we distinguish between abstract plans, representing high-level problem-solving strategies, and concrete solutions, proposing that learning plans develops transferable general reasoning capabilities and promotes better generalization. Building on this insight, we propose PlanLearn, a framework that combines plan-based search with Step-level Advantage Preference Optimization (Step-APO) to optimize plan learning. Experimental results show that PlanLearn, trained exclusively on GSM8K and MATH, not only significantly improves in-domain performance but also enhances out-of-domain benchmarks, such as HumanEval (+12.2%), GPQA (+8.6%), ARC-C (+4.0%), MMLU-STEM (+2.2%), and BBH (+1.8%). The code is available at https://github.com/tianlwang/PlanLearn.