Where CoT Reasoning Commits: Entropy Traces Identify Interpretable Attention Heads
Tianhe Zhang, Yonghong Deng, Ping Jian, Zhen Yang, Boyang Wang, Xinyue Zhang
Abstract
While LLMs demonstrate impressive reasoning capabilities, their internal decision dynamics remain opaque. To render these process interpretable and intervenable, we propose Dynamic Entropy Tracing, a mechanism-aware framework that interprets the evolving "choice state" of attention heads during CoT generation through stepwise head-wise option-logit and entropy tracing. Our analysis reveals distinct functional behaviors at attention heads: Steadfast Heads, characterized by consistently low entropy and producing a sharp, option-selective logit pattern with a stable top choice, and Wavering Heads, characterized by consistently high entropy and producing flat or oscillatory option logits without a persistent winner. Leveraging these traces, we identify a set of intervention targets and perform Selective Head Fine-Tuning, updating solely these selected heads against a frozen backbone. Experiments across the LLaMA and Qwen families reveal a striking plasticity hierarchy: fine-tuning just 30 Wavering Heads recovers over 98% of the performance achieved by full-parameter tuning, and in some settings modestly exceeds it. In contrast, intervening on Steadfast Heads yields much less gains. Our findings translate process-level mechanistic observables into a principled criterion for selective fine-tuning, offering a fundamental insight: the most effective tuning knobs are not the components that signal the final decision, but those that retain uncertainty, and thus plasticity, during its formation.- Anthology ID:
- 2026.findings-acl.133
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2777–2795
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.133/
- DOI:
- Cite (ACL):
- Tianhe Zhang, Yonghong Deng, Ping Jian, Zhen Yang, Boyang Wang, and Xinyue Zhang. 2026. Where CoT Reasoning Commits: Entropy Traces Identify Interpretable Attention Heads. In Findings of the Association for Computational Linguistics: ACL 2026, pages 2777–2795, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Where CoT Reasoning Commits: Entropy Traces Identify Interpretable Attention Heads (Zhang et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.133.pdf