Pavel Tikhonov


2026

We challenge the common assumption that Large Language Models (LLMs) build confidence gradually during reasoning. Instead, we find that conviction is often reached in a discrete "moment of insight", characterized by a sudden and sharp increase in an answer’s probability-a phenomenon we term a "confidence leap". Leveraging this discovery, we introduce a training-free, model-agnostic early-stopping heuristic that halts generation upon detecting such a leap, significantly reducing the generation length without sacrificing accuracy. We also demonstrate that the reasoning text leading up to this leap is semantically potent and transferable: feeding this partial reasoning to a different model family substantially boosts its performance. This suggests that the "confidence leap" marks a shared, interpretable reasoning milestone, not just a model-specific statistical artifact.
In-context learning (ICL) enables Large Language Models (LLMs) to adapt to new tasks using few examples, with task vectors, defined as specific hidden state activations hypothesized to encode task information. Existing studies are limited by small-scale benchmarks, restricting comprehensive analysis. We introduce QᴜɪᴛᴇAFᴇᴡ, a novel dataset of 3,096 diverse few-shot tasks, each with 30 input-output pairs derived from the Alpaca dataset. Experiments with Llama-3-8B on QᴜɪᴛᴇAFᴇᴡ reveal: (1) task vector performance peaks at an intermediate layer (e.g., 15th), (2) effectiveness varies significantly by task type, and (3) complex tasks rely on multiple, subtask-specific vectors rather than a single vector, suggesting distributed task knowledge representation.