From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models

Qidong Wang, Junjie Hu, Ming Jiang


Abstract
Recent work has increasingly explored neuron-level interpretation in vision-language models (VLMs) to identify neurons critical to final predictions. However, existing neuron analyses generally focus on single tasks, limiting the comparability of neuron importance across tasks. Moreover, ranking strategies tend to score neurons in isolation, overlooking how task-dependent information pathways shape the write-in effects of feed-forward network (FFN) neurons. This oversight can exacerbate neuron polysemanticity in multi-task settings, introducing noise into the identification and intervention of task-critical neurons. In this study, we propose HONES (**H**ead-**O**riented **N**euron **E**xplanation **S**teering), a gradient-free framework for task-aware neuron attribution and steering in multi-task VLMs. HONES ranks FFN neurons by their causal write-in contributions conditioned on task-relevant attention heads, and further modulates salient neurons via lightweight scaling. Experiments on four diverse multimodal tasks and two popular VLMs show that HONES outperforms existing methods in identifying task-critical neurons and improves model performance after steering. Our source code is released at: https://github.com/petergit1/HONES.
Anthology ID:
2026.findings-acl.1802
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
36151–36175
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1802/
DOI:
Bibkey:
Cite (ACL):
Qidong Wang, Junjie Hu, and Ming Jiang. 2026. From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models. In Findings of the Association for Computational Linguistics: ACL 2026, pages 36151–36175, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
From Heads to Neurons: Causal Attribution and Steering in Multi-Task Vision–Language Models (Wang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1802.pdf
Checklist:
 2026.findings-acl.1802.checklist.pdf