Yonghong Deng

2026

While LLMs demonstrate impressive reasoning capabilities, their internal decision dynamics remain opaque. To render these process interpretable and intervenable, we propose Dynamic Entropy Tracing, a mechanism-aware framework that interprets the evolving "choice state" of attention heads during CoT generation through stepwise head-wise option-logit and entropy tracing. Our analysis reveals distinct functional behaviors at attention heads: Steadfast Heads, characterized by consistently low entropy and producing a sharp, option-selective logit pattern with a stable top choice, and Wavering Heads, characterized by consistently high entropy and producing flat or oscillatory option logits without a persistent winner. Leveraging these traces, we identify a set of intervention targets and perform Selective Head Fine-Tuning, updating solely these selected heads against a frozen backbone. Experiments across the LLaMA and Qwen families reveal a striking plasticity hierarchy: fine-tuning just 30 Wavering Heads recovers over 98% of the performance achieved by full-parameter tuning, and in some settings modestly exceeds it. In contrast, intervening on Steadfast Heads yields much less gains. Our findings translate process-level mechanistic observables into a principled criterion for selective fine-tuning, offering a fundamental insight: the most effective tuning knobs are not the components that signal the final decision, but those that retain uncertainty, and thus plasticity, during its formation.

pdf bib abs

Over the past year, spatial intelligence has drawn increasing attention. Many prior works study it from the perspective of visual-spatial intelligence, where models have access to visuospatial information from visual inputs. However, in the absence of visual information, whether linguistic intelligence alone is sufficient to endow models with spatial intelligence, and how models perform relevant tasks with text-only inputs still remain unexplored. Therefore, in this paper, we focus on a fundamental and critical capability in spatial intelligence from a linguistic perspective: viewpoint rotation understanding (VRU). Specifically, LLMs and VLMs are asked to infer their final viewpoint and predict the corresponding observation in an environment given textual description of viewpoint rotation and observation over multiple steps. We find that both LLMs and VLMs perform poorly on our proposed dataset while human can easily achieve 100% accuracy, indicating a substantial gap between current model capabilities and the requirements of spatial intelligence. To uncover the underlying mechanisms, we conduct a layer-wise probing analysis and head-wise causal intervention. Our findings reveal that although models encode viewpoint information in the hidden states, they appear to struggle to bind the viewpoint position with corresponding observation, resulting in a hallucination in final layers. Finally, we selectively fine-tune the key attention heads identified by causal intervention to improve VRU performance. Experimental results demonstrate that such selective fine-tuning achieves improved VRU performance while avoiding catastrophic forgetting of generic abilities.

Co-authors

Venues

ACL1
Findings1

Fix author